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PREFACE TO THE SECOND EDITION 

Striking examples of the utility and scope of the science of sta- 
tistics have occurred in recent years. As Professor Harold Hotelling 
remarks {Annals of Mathematical Statistics , vol. 11, 1940, pp. 457- 
470): 

Indeed it seems as if the exploitation of the business and 
manufacturing possibilities of statistical methods has only 
begun and that limitless further fields are coming into view. 

The widespread use of statistical methods and the gratifying in- 
terest shown in the present book have made possible a second edition 
at this time. This opportunity has been used to polish and clarify 
certain portions of the text. For suggestions leading to the excision 
of obscurities I am indebted to many of my students and to a number 
of friends in other universities, particularly to Professors Irving W. 
Burr, John H. Curtiss, Henry ScheffiS, Guy G. Speeker, and Howard 
E. Wahlert. Of course, full responsibility for any remaining errors 
or other defects is my own. 

j; f. k. 
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PREFACE TO THE FIRST EDITION 

The field of statistics is many sided and ranges over different levels. 
However, between the levels of clerical work at one extreme and 
mathematical research at the other extreme, there is a well-defined 
methodology, mathematical in nature, which underlies the specialized 
applications in the departments of economics, psychology, education, 
and biology. 

This book is an elementary text dealing with the mathematics of 
statistics. Fortunately, a considerable part of the descriptive meth- 
odology of statistics can be understood by those having relatively 
little knowledge of college mathematics. Although no mathematics 
beyond the ordinary Freshman course in college algebra is required 
for a profitable reading of this text, a certain degree of mathematical 
maturity and intelligence is presupposed. To achieve the maximum 
success perhaps only the best of those students whose mathematical 
preparation is limited to the minimum prerequisite should be encour- 
aged to study it. Occasionally, material is introduced to sharpen 
the interest and challenge the ability of the more advanced student 
without interrupting the main developments or discouraging those 
less mature. 

In writing this book, considerable selection of material necessarily 
had to be made. The omission of certain topics will be noted in the 
table of Contents. Judging from my own experience, and that of 
others, the theory of sampling cannot be taught satisfactorily at the 
level for which Part I is intended. At best only a superficial use of 
formulas could be hoped for. Consequently, I have elected to defer 
this subject to Part II where a systematic treatment can be given. 
With regard to time series analysis, Professor J. Neyman says in his 
Lectures And Conferences On Mathematical Statistics (p. 106), 

We start by trying to split each of the series into several parts,' which we 
arbitrarily assume to be additive. One of these parts is the trend, which we 
estimate perhaps by fitting a low order parabola to the whole series available. 
The next part is the “ business cycle.” The third part is the “ seasonal varia- 
tion,” which we frequently estimate by calculating moving averages. Finally, 
the remainder is considered to arise from random causes, and we concentrate 
on the question whether such a remainder in one of the variables is correlated 
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with that in some other. All this procedure seems to me very artificial and 
arbitrary. ... In my opinion the whole problem of time series must be treated 
from a point of view that is quite different from the traditional one just described. 


I concur in this opinion and I believe that no useful purpose would 
be served by drilling students in the traditional procedures. 

Throughout the book the student is encouraged and stimulated to 
master fundamental principles and concepts. Essentially, the job 
of every statistician is to take hold of situations and disentangle 
them by the techniques of the science. Therefore, considerable 
emphasis is placed on technique. I have tried to develop in the 
student the ability to use symbolism creatively as a lan g ua ge. 
Numerous examples are given to clarify concepts and illustrate 
processes. Over two hundred exercises are included. It is intended 
that these exercises should be handled as in a mathematics course. 
No laboratory, so-called, is necessary. 

Nowadays, no little importance is attached to motivation. I have 
constantly held in mind the necessity of making the subject interest- 
ing and stimulating to the beginning student. Nevertheless, I ven- 
ture the opinion that the best motivation for intelligent students is 
the feeling that their teacher knows his subject. 

In preparing the manuscript a large number of books and papers 
have been examined and perhaps leaned upon. No claim to origi- 
nality is made except possibly in the matter of arrangement and 
pedagogical approach. Numerous references to the scholarly achieve- 
ments of others are cited. It is hoped that the serious student will 

read some of these and thereby widen his perspective and enhance 
ms interest. 

In conclusion, I wish to express my deep appreciation to Professor 
Allen T._ Craig and Dr. Mason E. Wescott who critically read the 
manuscript and made many suggestions for its improvement. 


Evanston, Illinois. 
April, 1939 


John F. Kenney 
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MATHEMATICS OF STATISTICS 

INTRODUCTION 

1. Definition. The word statistics is used in at least two different 
senses. .Construed as plural it refers to the systematic presentation 
of quantitative data. Used in a singular sense, the word statistics 
refers to the science which has for its object the classification and 
analysis of quantitative data so that intelligent judgments may be 
passed upon them. 

It is usually clear from the context which meaning 1 is intended, 
although some persons prefer the expression statistical methods 
for this second meaning. Statistical methods are all those devices 
used in the collection and analysis of data. The theory of statis- 
tics is the exposition of statistical methods and is of a mathematical 
nature. 

2. Scope. There used to be a widespread misapprehension that 
statistics is a branch of economics. As a matter of fact, statistical 
problems arise in many different fields — biology, economics, engi- 
neering, insurance, education, physics, and astronomy, as well as 
various branches of business. The exploration of certain aspects of 
nearly every field involves some phase of statistical theory. Indeed, 
certain types of statistical methodology may have almost unexpected 
applications — the* discovery, for example, that the life of physical 
property 2 is governed by much the same statistical rules as govern 
the lives of human beings, and hence, that life tables may be applied 
to both. Physicists have discovered that many of the problems in 
the modern theory of the structure of the atom are essentially sta- 
tistical in nature. In recent years industrial companies have placed 
an increasing reliance on statistical methods in controlling the 
quality of goods during manufacture. 

Statistics as a science is making contributions to all the sciences. 
On the other hand, some sciences like biometry and physics have 

1 In addition to the two meanings given above, another has crept into the 
recent literature where reference is made to a statistic. This term will be ex- 
plained later. 

2 Life Expectancy of Physical Property — E. B. Kurtz. Ronald Press. 

1 
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contributed much in the development of statistics and its terminology. 
The following quotation from Science may appropriately be men- 
tioned here: 

The extension of~the scope of quantitative methods through the medium of 
statistical analysis is one of the most significant things going on in the scientific 
world at the present time. 1 

The importance of statistical method in present-day thinking has 
been well stated, as follows: 

More and more the modern temper relies upon statistical method in its at- 
tempts to understand and to chart the workings of the world in which w T e live. 
Particularly in those sciences which deal with human beings, whether in their 
physical and biological aspects or in their social, economic, and psychological 
relations, the spirit of our time asks that its conclusion be based not so much 
upon the distinctive reactions of one or two individuals as upon the observation 
of large numbers of individuals, the measurement of their common likenesses and 
the extent of their diversity. As the data thus gathered from mass phenomena 
become extensive, it becomes imperative to have methods of organization to 
bring the facts within the compass of our understanding, methods of analysis 
to make the essential relations appear out of the mass of detail in which they 
are hidden, and methods of classification and description to facilitate the pres- 
entation of the data for the study and consideration of other persons. Thus 
statistical method becomes a telescope through which we can study a larger 
terrain than would be accessible to our unaided vision. 2 

3. Statistical Methods in the Social Sciences. Because statistics 
is fundamentally the study of aggregates of individuals, rather than 
of individuals, whether these individuals be observations or measure- 
ments or persons, it is apparent that statistical methods are essential 
to social studies. Indeed it has been said that it is principally by the 
aid of such methods that these studies may be raised to the rank of 
sciences. 

This particular dependence of social studies upon statistical methods 
is mentioned in a recent book 3 from which we quote the following: 

If, as seems probable, our present uncoordinated large-scale business is to be 
further developed into an efficiently managed instrument of production serving 
the needs of the people, then statistics, together with mathematical economics, 
will emerge among the most important tools of the social sciences. For it is by 

1 Science. January 18, 1929. 

2 Mathematics and Statistics — Walker. Sixth Yearbook, National Council of 
Teachers of Mathematics. 

3 Reprinted by permission from Methods of Statistical Analysis by Davies and 
Crowder, published by John Wiley and Sons, Inc. 
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means of averages, dispersions, coefficients of variability, trends, and regressions, 
as pictured in control charts, that management is able to visualize and direct the 
movements of large masses of population. 

The work of the statistician is much like that of the map maker who presents 
the traveler with a sketch of important highways, showing the locations of towns 
and geographical features. The map is not a picture of reality. It shows cities 
as dots, and rivers as lines. It has purposely omitted the interesting details of 
scenery and the still more important features of human interest which lie along 
the route and which constitute the traveler's real objectives. Nevertheless, as 
a means of reaching these objectives, the map is extremely useful. And so it is 
with statistics in the hands of the business executive and statesman. Back of the 
charts are human beings with their varying characteristics and vital interests, 
few of which can be described in figures. Yet as a means of serving these interests, 
of keeping trade moving from one region to another, of allocating investment and 
labor, and of apportioning relief to maladjusted industries and dependent classes, 
statistics and mathematical methods are important, and are becoming increas- 
ingly important with the growing complexity of society. 

It may be said that the study of statistics is not merely an attempt to de- 
scribe what actually occurs, though it must begin at this point, but in its broader 
aspects it is the logical background of business and social management. Hence 
what appears now to be mere abstraction may later become the basic necessity 
of an applied science. Eventually, it may be assumed, the social arts of business 
and politics will rest upon as substantial a theoretical and mathematical back- 
ground as physics, chemistry, and engineering. 

4. Mathematics and Statistics. Statistical problems are of inter- 
est, therefore, not only to the worker in the particular field but also 
to the mathematician, inasmuch as methods adequate to the treat- 
ment of these problems can best be presented in the precise and 
accurate language of mathematics. Moreover, statistical methods 
are grounded in statistical theory which is a branch of applied mathe- 
matics. 

Although it is true that some statistical problems are ultimately 
problems in advanced mathematics, many of which mathematicians 
have not yet been able to solve, nevertheless a large and interesting 
part of statistical analysis requires mathematics nq more advanced 
than elementary algebra. 

It has been said that sooner or later every true science tends to 
become mathematical. The notation of mathematics is simply a 
language and it is not limited to any particular field of knowledge. 
The following quotations are inserted to help the student approach 
the study of statistics in the proper spirit. 

L Mathematics, the science of the ideal, becomes the means of investigating, 
understanding, and making known the world of the real. — White. 
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2. Probably among all the pursuits of the university, mathematics preemi- 
nently demands self-denial, patience, and perseverance — Todhunter. 

3. From time immemorial, there has been but one way to become a mathe- 
matician and there will never be another: it is a way interior to the subject and 
involves years of assiduous toil. Short-cuts to mathematical scholarship there 
are none, whether the seeker be a philosopher or a king. — Keyser. 

4. Will is the creative force. Without the will to learn there is no learning. 
And when the will is feeble and confused, learning lags. — Mursell. 

5. The theory of statistics is not easy, not so much because it is abstruse, as 
because the ideas are new to most people, and a good deal of hard thinking and 
patient work will be necessary. . . . Statistical work always involves a lot of 
computing [and] there is no better way of learning statistics than by working 
through examples. — Tippett. 

6. Problem Assignments. The student should realize at the out- 
set that statistical methods are not substitutes for thinking but are 
aids and supplements to it. A superficial knowledge of statistical 
technique cannot take the place of good judgment. Mere ability to 
substitute in formulas should not be confused with genuine statistical 
sophistication and insight. To the serious and capable student who 
intends to master this course, formulas will be a set of functioning 
concepts and tools rather than machines into which material may be 
fed to grind out a meaningless answer. 

This opportunity is also taken to point out that even mathemat- 
ical discourse consists of sentences. Punctuation should not be 
omitted in sequences of equations and other mathematical state- 
ments. (It is admitted, however, that many of us find this difficult 
to remember.) 

Throughout the book exercises are inserted to give the student an 
opportunity to test his knowledge of the theory and methodology, 
and to develop his power of analysis. In grading the solutions, value 
will be attached to accuracy, thoroughness, neatness, and systematic 
arrangement of the work. 

6. Calculating Machines. 1 A full description of the parts of a cal- 
culating machine and their operation may be obtained from an In- 
struction Book which is furnished by the manufacturer, so only a 
brief description will be given here. 

A calculating machine is constructed to add and substract. By 
means of continued addition or subtraction, operations involving 
multiplication, division, and square root can also be performed with 
great speed. 

1 The early history of modern computing machines is outlined in the American 
Mathematical Monthly , vol. 31 (1924), pp. 422-429. 
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In addition to a keyboard on which numbers can be punched, most 
machines have a sliding carriage, carrying two dials one above the 
other. These dials are called revolution register (upper dial) and 
product register (lower dial). In finding a product nx, one of the 
factors n is punched on the keyboard and as the motive crank at the 
side is turned, 1 the other factor x appears on the upper dial. The 
product nx is then read from the lower dial. 

An important property of the modern calculating machine is its 
adaptability to short cuts and combinations of operations. For 
example, one may multiply two numbers nx together and add the 
result to a third number k without tabulating the intermediate steps. 
This is accomplished by punching the number k on the keyboard, 
transferring it to the lower dial (product register), and then proceed- 
ing as in finding the product nx. The result nx + k is then read 
from the lower dial. An extension of this procedure is especially 
useful in a series of computations where k and n are constant and 
various values are assigned to x . To describe the procedure, sup- 
pose it is required to calculate the successive values of 12 + 6x for 
x = 5, 7, 15, 12, etc. The number k = 12 is first registered on the 
lower dial, then the factor n = 6 is placed on the keyboard, and by 
turning the crank forward five times to make the first value of x = 
5 appear on the upper dial, the result 12 + 6 X 5 appears on the 
lower dial. Instead of clearing the dial, the crank is now turned 
forward twice more to rebuild the value x = 5 into x = 7, and the 
result 12 + 6 X 7 can be read from the lower dial. In rebuilding 
x = 15 into x = 12 the crank is turned backwards. This procedure 
can be repeated until all the required values of 12 + have been 
calculated. A process of this sort is called the continuous method of 
calculating. 

In most of the exercises in this course, the computations are not 
laborious and calculating machines are not required. However, if 
machines are available they may be used to advantage in Chapters 
IV and VI. The student who desires to develop skill on a calculat- 
ing machine should begin now to study an Instruction Book and 
practice the fundamental operations explained there. 

7. Collateral Reading. Perhaps no single textbook can meet all 
the needs of all students of statistics. There are several good books 
on elementary statistics which, although not fundamentally different, 

1 The beginner will probably wish to practice on a manually operated machine 
before attempting to use the high-speed electric and automatic machines. 
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present different points of view on certain topics and treat them with 
varying degrees of emphasis depending upon the field of major inter- 
est. At least some of the books listed below should be readily avail- 
able on the reserve shelf of the library. The list should be useful to 
those who wish to study more fully certain details in which they may 
be interested. 

1. Bivins — The Ratio Chart in Business. Codex Book Co. 

2. Burgess — The Mathematics of Statistics. Houghton Mifflin and Co. 

3. Camp — The Mathematical Part of Elementary Statistics. D. C. Heath 
and Co. 

4. Deming — Statistical Adjustment of Data. John Wiley & Sons, Inc, 

5. Freeman — Industrial Statistics. Wiley. 

6. Garrett — Statistics in Psychology and Education. Longmans, Green 
and Co. 

7. Glover — Tables of Applied Mathematics. Wahr. . 

8. Haskell — Graphic Charts in Business . Codex Book Co. 

9. Mills — Statistical Methods , Revised. Henry Holt and Co. 

10. Pearl — Medical Biometry and Statistics. W. B. Saunders and Co. 

11. Rider — Statistical Methods. Wiley. 

12. Scarborough — Numerical Mathematical Analysis. The Johns Hopkins 
Press. 

13. Snedecor — Statistical Methods. Collegiate Press, Inc., Ames, Iowa. 

14. Treloar — Statistical Reasoning. Wiley. 

15. Walker — Elementary Statistical Methods . Holt. 

16. Yule and Kendall — The Theory of Statistics . Griffin and Co. 


CHAPTER I 

FREQUENCY DISTRIBUTIONS 


1. Variables and Constants. A variable is a number symbol 
which may take on any value in a set of values which is called its 
range. A constant is a symbol whose range consists of only one value 
(in a particular discussion or situation). Letters toward the end of 
the alphabet, such as x, y, u, and v, are commonly used to denote 
variables. When a constant does not have a definite value such 
as 2, f, r, and so forth, it is customary to represent the constant by a 
letter toward the beginning of the alphabet. 

Two famous constants are 

7T = 3.14159 .. ., e= 2.71828... 

They occur in mathematics in many important, interesting, and 
even curious ways. As instances of the latter, the following ex- 
amples are noteworthy. 

e = + , where n ! = n(n - 1 )(n - 2) • • • 1. 


7 2 

2-f-... 


7T 


1 4 - 


l 2 


2 4 - 


3 2 


2 + ■ 


5 2 


24 - 


The expression for e is called a convergent infinite series and that for 
7r/4 a continued fraction. 

2. Variates. In general, statistical data are obtained by taking 
observations or measurements on one or more variables. The values 
thus obtained are sometimes called variates. 1 For example, in com- 
puting the average monthly rainfall of a region the variable is rain- 
fall and the amount of rainfall for any month is a variate. Like- 

1 A somewhat different usage of this term is explained in Part II. 

i 
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wise, if the bank clearings of the city of Madison are under considera- 
tion, then the variable is bank clearings, and the clearings for any 
specified interval are variates. If we denote a variable by x then 
the N values which it takes bn are denoted by x h x 2f * • •, x N . 

Variates are of two kinds: continuous and discrete . Continuous 
variates are values of a variable which, theoretically, can be meas- 
ured to any degree of fineness, such as heights, weights, temperatures, 
ages. All the numbers between x = 0 and x = 1 form a set of con- 
tinuous variates. But if we restrict x to the rational numbers in 
this interval we have a set of separate and distinct values with 
“vacant” spaces between them. Values of a variable which are 
thus restricted to particular values in order to have any meaning 
are called discrete variates. Other examples of discrete variates are: 
size of families, closing prices of stocks, “ successes ” in tossing a coin. 
A set of discrete variates is usually obtained by counting whereas 
continuous variates are usually obtained by measurement. 

3. Accuracy of Measurements. In the case of continuous vari- 
ates, the observed values as recorded can never be absolutely estab- 
lished by measurement. Thus, the height or weight of an object can 
be measured only approximately, the error depending upon the pre- 
cision of the instrument and the care and accuracy of the observer. 
However, it is not always necessary that measurements be recorded 
as accurately as it is possible to make them. Similarly, in the case 
of discrete variates the standard of accuracy used may be less 
than it is possible to obtain. In population statistics, for example, 
it may be sufficient to record the numbers to the nearest 
thousand, with three zeros at the end to fill out to the decimal point. 
Thus, 

City Population 

A 326,000 

B 729,000 

On the other hand, the exact number of students in a university 
might be required. The degree of accuracy needed is determined 
by the purpose of the investigation and it is limited by the closeness 
with which the variables can be measured. 

It follows, therefore, that the degree of accuracy in the final result 
of a problem involving computations is limited by that of the original 
data. Students sometimes carry results of problems to five or more 
decimal places when the original data do not justify more than two 
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or three decimal places. A table of measurements which constitutes 
the raw data for a statistical investigation should always specify the 
degree of accuracy in the readings. Thus, if monthly rainfall is being 
measured to the nearest hundredth of an inch, and one measurement 
seems to be exactly 5 inches, it should be recorded as 5.00 inches, with 
two zeros. A measurement that is merely recorded as 5 means it is 
correct to the nearest integer and its true value lies between 4.5 and 
5.5, whereas 5.00 means the true value is known to lie between 4.995 
and 5.005. The three digits in 5.00 are said to be sig nifi cant. 

4. Necessity for Classification, After the data have been col- 
lected in any statistical investigation the first step has to do with 
introducing order in the raw material. Usually we have some hun- 
dreds of variates which have been recorded merely in the arbitrary 
order in which the observations or measurements happened to be 
made. But in order to analyze a series of variates so that intelligent 
judgments may be formed about it or that comparisons may be made 
between two series of variates, proper classification is necessary and 
of prime importance. 

Such classification is not always an easy thing to effect, because it 
is the one part of statistical methods for which no very definite rules 
can be given. Most people, until they have tried, imagine that to 
collect and arrange data in classes and in tables is a straightforward 
procedure involving no great technique or experience. Although 
much can be learned from a careful study of the illustrations and dis- 
cussions that appear in the following pages and the compilations of 
reputable bureaus such as the census volumes, nevertheless, experi- 
ence is the best teacher in effecting the most appropriate classification 
for any set of variates. 

5. Tabulation. In carrying out the process of classification, it 
becomes natural to arrange the results in tabular form, setting forth 
clearly and explicitly the statistics one wishes to present. In draw- 
ing up any table the following general rules should be observed: 

(1) Every table must be self-explanatory. To accomplish this 
the title should be short, but not at the expense of clearness. 

(2) Full explanatory notes, when necessary, should he incorporated 
in the table, either directly under the descriptive title and 
before the body of the table, or else directly under the form. 

(3) The columns and rows should be arranged in a logical order to 
facilitate comparisons. 
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(4) In tabulating long columns of figures, spaces should be left 
after every five or ten rows. Long unbroken columns are con- 
fusing, especially when one is comparing two numbers in a 
row but in widely separated columns. 

(5) If the numbers tabulated have more than three significant 
figures, the digits should be grouped in threes. Thus, one 
should write 4 685 732, not 4685732. 

(6) Double lines at the top (or at the top and bottom) may en- 
hance the effectiveness of a table. If the table nicely fills 
the width of the page, no side lines should be used. In such 
cases the omission of the side lines will have the tendency to 
emphasize the other vertical lines and cause the interior col- 
umns to stand out better. The columns should not be widely 
separated and the form of a narrow, compact table should 
have its side lines. 

The following points are particularly important in practical work: 

(7) Source of data should be included. 

(8) Units of the data presented should be clear. 

(9) Accuracy of transcription must not only be striven for but 
actually achieved. A reader who finds one error (even though 
this be the only one) is likely to disparage the whole table. 

Table 1 — Grades of 100 Students in Freshman Mathematics 


75 

86 

66 

86 

50 

78 

66 

79 

68 

60 

80 

83 

87 

79 

80 

77 

81 

92 

57 

52 

58 

82 

73 

. 95 

66 

60 

84 

80 

79 

63 

80 

88 

58 

84 

96 

87 

72 

-65 

79 

80 

86 

68 

76 

41 

80 

40 

63 

90 

83 

94 

76 

66 

74 

76 

68 

82 

59 

75 

35 

34 

65 

63 

85 

87 

79 

77 

76 

74 

76 

78 

75 

60 

96 

74 

73 

87 

52 

98 

88 

64 

76 

69 

60 

74 

72 

76 

57 

64 

67 

58 

72 

80 

72 

56 

73 

82 

78 

45 

75 

56 


, 6. Frequency Distribution. From the standpoint of a mathemati- 

cal analysis of statistics, the most important form of tabulation is 
the so-called frequency distribution. Rough data do not present 
any clear ideas of description unless they are organized and condensed 
in a systematic way. We therefore partition the raw data into 
classes of appropriate size, showing the corresponding frequency of 
variates in each class. When any set of statistics is systematically 
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arranged in this way it is called a frequency distribution. For ex- 
ample, upon an examination of the raw data of Table 1, it is diffic ult 
to state any very definite conclusions as to whether these grades rep- 
resent preponderantly good students or poor ones. The frequency 
distribution of Table 2, however, does give us more precise infor- 


Table 2 — Frequency Table of 100 Grades 


Class Limits 

Tally Marks 

Frequency 

30-39 

// 

2 

40-49 

/// 

3 

50-59 

/ 

U 

60-69 

MM-Mr-Mt -Mt 

20 

70-79 

-Mr -Mr -Mr mt -Mr -mt // 

32 

80-69 

-Mr jw -wf- 

25 

90-99 

MM // 

7 

Total 


IO O 


mation. We see at a glance that there were 32 students with grades 
between 70 and 80, and that all but 16 had grades of 60 or above. In 
Table 3, the confusion of detail is still more apparent. The corre- 
sponding frequency distribution is given in Table 4. 

The width of a class is called the class interval, and in general 
the successive class intervals should be of equal width. The mid- 
value of such an interval is variously called the class mark, mid- 
value, central value. The width of a class interval is therefore 
seen to be the common difference between two consecutive class 
marks. It is also the difference between the lower (or upper) 
limit of two successive classes. Thus, in Table 4, the class inter- 
val is half an inch and the successive class marks are 0.245, 0.745, 
etc., inches. 

7. Class Intervals. Grouping variates into the most appropriate 
number of classes is a matter of judgment. The choice of intervals 
to be used in tabulating any particular set of variates depends upon 
the nature and characteristics of the data and the purpose for which 
it is to be used. In the case of discrete variates, the unit is a natural 
interval and sometimes it is satisfactory. (See Tables 10 and 11.) 
However, for both discrete and continuous variates the following 
conditions should guide the choice: (a) We desire to be able to treat 
all the values assigned to any one class, without serious error, as if 
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Table 3 — Monthly Rainfall at Iowa City, 1890-1925 
Year Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 



1890 2.75 0.75 1.80 1.83 2.20 7.99 0.30 2.29 1.44 2.11 1.56 0.31 

1891 1.49 1.30 4.41 1.11 4.46 2.80 3.01 3.45 2.33 1.63 2.93 2.72 

1892 1.46 1.23 3.15 4.30 9.23 8.29 6.20 2.50 1.18 1.02 1.38 2.84 

1893 1.18 1.75 2.82 4.37 1.79 3.01 3.56 1.64 3.07 1.98 1.75 1.52 

1894 1.95 1.64 2.03 2.72 3.09 2.40 0.90 2.40 4.96 2.30 1.80 0.98 

1895 2.37 0.64 1.25 1.66 4.26 1.10 10.10 1.77 3.43 1.38 1.78 2.84 

1896 0.70 1.51 0.92 5.14 4.10 1.86 7.04 2.44 1.82 2.74 1.16 0.55 

1897 3.66 1.30 2.07 4.60 3.11 2.38 3.83 1.85 3.54 0.33 1.98 2.48 

1898 4.62 1.15 3.02 2.89 4.80 3.26 2.27 2.85 2.54 4.38 1.10 0.53 

1899 0.59 1.82 1.43 3.23 9.49 4.50 3.78 2.39 0.93 1.66 1.15 1.93 

1900 0.73 2.20 3.32 3.31 4.31 2.18 5.25 6.27 4.35 3.61 1.43 0.75 

1901 1.07 1.97 3.62 2.36 1.54 3.33 1.29 0.66 2.56 1.78 0.79 2.34 

1902 1.29 0.85 1.29 1.91 3.75 7.46 6.89 10.91 5.87 3.12 2.25 2.21 

1903 0.67 1.03 1.86 3.11 6.90 1.95 4.76 3.45 5.38 3.60 0.97 1.27 

1904 1.74 0.84 2.73 5.49 2.68 2.14 2.49 3.93 3.12 1.59 0.25 1.96 

1905 1.22 1.90 2.28 3.36 5.37 6.68 3.59 2.62 1.54 5.36 2.92 1.04 

1906 2.51 1.73 2.25 1.83 2.33 3.64 1.42 5.34 0.89 1.48 3.08 1.64 

1907 2.12 0.22 1.59 1.58 5.47 6.04 9.21 2.98 2.85 0.86 1.07 0.53 

1908 0.32 2.08 2.94 2.78 7.78 2.87 5.40 7.47 1.82 1.99 1.84 0.43 

1909 1.97 1.09 2.00 7.21 4.40 4.58 5.75 1.88 2.43 1.59 4.88 2.52 

1910 1.79 0.39 0.28 2.56 3.57 0.98 2.22 4.98 3.87 0.57 0.69 0.46 

1911 0.87 4.82 1.30 3.02 4.74 2.98 3.70 4.27 5.07 2.78 3.01 2.29 

1912 0.26 1.21 2.30 3.50 2.88 2.60 3.60 3.62 2.67 3.54 1.11 0.75 

1913 1.19 1.42 2.69 1.83 6.91 6.28 0.39 2.97 3.19 3.66 0.46 1.02 

1914 1.28 0.93 2.63 2.37 4.87 5.32 1.53 2.99 7.97 1.65 0.37 1.89 

1915 2.15 2.42 0.92 0.65 7.65 4.33 8.11 1.80 9.31 1.84 1.80 0.80 

1916 3.18 0.59 5.06 1.83 5.99 3.92 1.57 2.83 3.49 3.19 1.42 1.15 

1917 1.09 0.19 2.19 3.43 7.33 6.49 2.84 2.79 6.23 2.28 0.30 0.57 

1918 1.10 1.46 0.33 3.43 6.22 8.36 4.87 6.72 2.00 2.05 2.10 1.62 

1919 0.08 2.63 2.65 4.28 4.49 7.07 1.03 2.67 5.10 4.01 3.84 0.61 

1920 0.84 1.33 4.22 4.75 3.76 2.86 2.79 2.90 1.20 0.98 1.80 2.45 

1921 0.35 0.49 2.46 6.20 4.44 2.46 3.59 8.61 7.83 2.47 0.74 3.19 

1922 1.11 1.46 2.18 3.49 5.52 0.28 6.46 1.03 2.91 1.06 5.28 0 M 9 

1923 1.09 0.67 4.83 0.86 2.63 6.21 2.37 4.01 9.27 2.35 1.13 0 73 

1924 1.35 0.83 2.10 1.09 1.69 8.71 3.67 5.67 2.60 1.64 0.93 1 75 

1925 0.29 1.04 0;99 3.07 1.06 5.61 3.63 3.14 5.59 3.90 1.00 1.66 
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they were equal to the class mark for that interval; e.g., as if all 
23 items in the first class of Table 4 were exactly 0.245 inches, etc. 
(i b ) For convenience and brevity we desire to make the interval as 
large as possible subject to the first condition. These conditions will 
generally be fulfilled if the interval is so chosen that the whole num- 


Table 4 — Frequency Table of Monthly Rainfall at Iowa City, 

1890-1925 


Class Interval 

Mid-x 

Frequency 

0.00- 0.49 

0.245 

23 

0.50- 0.99 

0.745 

42 

1.00- 1.49 

1.245 

58 

1.50- 1.99 

1.745 

62 

2.00- 2.49 

2.245 

49 

2.50- 2.99 

2.745 

47 

3.00- 3.49 

3.245 

32 

3.50- 3.99 

3.745 

27 

4.00- 4.49 

4.245 

18 

4.50- 4.99 

4.745 

15 

5.00- 5.49 

5.245 

14 

5.50- 5.99 

5.745 

7 

6.00- 6.49 

6.245 

10 

6.50- 6.99 

6.745 

5 

7.00- 7.49 

7.245 

6 

7.50- 7.99 

7.745 

5 

8.00- 8.49 

8.245 

3 

8.50- 8.99 

8.745 

2 

9.00- 9.49 

9.245 

5 

9.50- 9.99 

9.745 

0 

10.00-10.49 

10.245 

1 

10.50-10.99 

10.745 

1 

Total 


432 


ber of classes lies between 10 and 25. A small number of classes 
may “ cover up ” too much detail whereas a large number may 
reveal too much detail for one to comprehend readily (which is 
just the objection to the table of original data). A preliminary 
inspection of the data should accordingly be made and the highest 
and lowest values selected. Dividing the difference between these 
by the tentative number of classes, we have our approximate value 
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Table 5 — Monthly Rainfall at Des Moines, 1890-1925 


Year 

Jan. 

Feb. 

Mar . 

Apr. 

May June 

July Aug. Sept . 

Oct. 

Nov. 

Dec. 

1890 

2.62 

1.17 

0.91 

0.78 

3.00 

4.91 

1.10 

3.35 

1.57 

4.48 

0.74 

0.11 

1891 

1.82 

1.13 

2.25 

2.12 

3.29 

5.60 

2.78 

4.22 

1.64 

2.41 

1.34 

1.54 

1892 

1.60 

1.35 

2.47 

3.36 

8.77 

3.41 

8.64 

2.45 

1.12 

2.54 

0.76 

1.95 

1893 

0.56 

1.28 

1.15 

5.61 

2.84 

4.69 

3.55 

1.60 

1.33 

0.22 

1.51 

1.30 

1894 

1.09 

1.39 

1.78 

1.70 

1.41 

1.67 

0.29 

1.89 

4.46 

2.24 

0.99 

1.15 

1895 

1.30 

0.60 

0.50 

3.41 

2.86 

5.26 

3.10 

3.57 

3.20 

0.29 

0.85 

1.86 

1896 

0.60 

0.79 

1.24 

3.47 

6.50 

2.69 

8.15 

5.49 

3.61 

2.69 

1.10 

0.85 

1897 

2.02 

0.71 

2.13 

7.37 

2.31 

3.15 

2.88 

1.77 

1.56 

0.85 

0.34 

1.98 

1898 

1.59 

0.82 

1.35 

2.64 

4.22 

6.85 

1.86 

1.09 

1.91 

3.56 

1.87 

0.57 

1899 

0.29 

0.57 

1.04 

2.22 

6.71 

3.53 

3.20 

3.53 

1.17 

0.59 

1.76 

2.12 

1900 

0.20 

0.50 

3.07 

3.82 

4.76 

4.89 

5.15 

8.02 

3.66 

3.08 

0.96 

0.35 

1901 

1.01 

1.11 

3.02 

2.26 

1.40 

2.41 

1.72 

0.67 

2.60 

2.14 

0.40 

1.03 

1902 

0.91 

0.52 

1.15 

1.55 

4.69 

7.27 

5.95 

7.82 

5.03 

3.70 

1.65 

1.77 

1903 

0.20 

1.12 

1.09 

1.64 

0.64 

3.06 

3.62 

6.72 

1.62 

1.32 

0.31 

0.09 

1904 

1.22 

0.22 

1.20 

5.48 

3.16 

2.08 

6.94 

2.60 

1.95 

1.50 

0.06 

2.02 

1905 

1.08 

1.00 

2.16 

3.29 

4.44 

5.73 

4.53 

5.21 

3.47 

3.64 

2.34 

0.55 

1906 

2.07 

0.86 

1.84 

2.96 

2.21 

3.80 

2.67 

4.69 

3.24 

1.18 

2.29 

1.46 

1907 

0.87 

0.93 

1.18 

1.48 

2.97 

4.13 

10.20 

5.03 

2.40 

1.70 

1.12 

1.01 

1908 

0.46 

1.15 

1.43 

2.69 

9.89 

5.93 

1.56 

6.54 

0.94 

3.68 

0.95 

0.31 

1909 

1.61 

0.90 

1.56 

5.14 

4.24 

7.01 

4.41 

0.14 

2.06 

2.89 

3.71 

2.32 

1910 

1.72 

0.20 0.33 

1.13 

3.26 

3.11 

0.86 

2.40 

3.82 

0.68 

0.53 

0.20 

1911 

0.84 

2.91 

1.14 

4.23 

2.44 

0.75 

1.16 

1.82 

7.68 

2.61 

1.22 

3.18 

1912 

0.53 

1.86 

2.87 

2.75 

5.62 

2.60 

3.07 

3.52 

4.20 

3.75 

1.11 

0.30 

1913 

1.10 

0.65 

3.03 

3.41 

5.06 

3.52 

4.05 

3.44 

2.65 

2.67 

1.03 

1.05 

1914 

0.85 

1.24 

1.18 

1.52 

4.83 

3.89 

1.22 

1.77 

4.81 

3.57 

0.35 

1.28 

1915 

1.96 

3.20 

1.16 

1.36 

8.21 

3.60 

9.39 

1.71 

4.51 

0.43 

1.24 

0.65 

1916 

2.66 

0.61 

0,60 

2.44 

3.87 

2.42 

1.50 

2.62 

1.72 

2.11 

1.46 

0.65 

1917 

6.53 

0.52 

2.30 

5.52 

3.94 

8.16 

1.58 

1.82 

1.99 

0.92 

0.21 

0.88 

1918 

0.78 

1.45 

0.29 

1.81 

5.87 

5.63 

1.18 

2.54 

0.91 

3.81 

2.10 

1.35 

1919 

0.08 

3.00 

3.67 

5.30 

2.96 

7.36 

2.68 

2.19 

7.47 

2.20 

3.84 

0.93 

1920 

0.44 

0.74 

3.92 

4.09 

3.14 

1.25 

5.66 

2.11 

4.44 

1.89 

1.63 

1.38 

1921 

0.59 

0.92 

1.07 

3.72 

3.62 

4.66 

2.49 

6.63 

7.16 

1.51 

0.35 

0.80 

1922 

0.85 

0.64 

2.25 

2.84 

6.87 

1.63 

7.13 

6.63 

3.00 

3.41 

2.54 

0.25 

1923 

0.88 

0.36 

4.34 

1.76 

4.78 

4.95 

0.78 

5.34 

5.17 

1.10 

0.55 

0.61 

1924 

1.02 

1.98 

3.10 

0.78 

1.26 

9.30 

0.98 

4.15 

3.47 

0.77 

0.53 

1.62 

1925 

0.23 

0.50 

0.88 

1.64 

0.77 

6.40 

2.21 

4.79 

3.75 

3.22 

0.32 

1.67 
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of the interval. After a little preliminary recon noitering an appro- 
priate number of classes and their limits can be determined. Thus, 
in Table 3, the highest value noted was 10.91 and the lowest 0.08 
(verify). The difference between these is 10.83, which suggests that 
if we took 20 classes we would have approximately a half inch as the 
width of a class interval. This, however, assumes we would start 
with 0.08 as our lower limit, which would give us awkward figures as 
limits. Therefore, our judgment suggests it would be better to start 
with 0 and continue by half-inch intervals as far as is necessary to 
take in the range of the given variates. We have estimated it will 
take approximatey 20 of these; actually it turns out to be 22. This 
number of intervals and their width is consistent with the general 
conditions (a) and ( b ) given above. On page 16 are given some 
supplementary rules which in general are helpful in making a fre- 
quency distribution. 

8. Distinction between Class Limits and Class Boundaries. The 
pairs of numbers written in the column of classes of a frequency dis- 
tribution are the lower and upper class limits , sometimes called open 
class limits. For instance, 1.00-1.49 are the limits of the third class 
of Table 4. When the measurements of Table 3 were made, readings 
were recorded to the nearest hundredth of an inch. Thus, a measure- 
ment which was more than 1.485 and less than 1.495 was recorded 
as 1.49. Likewise, if a measurement was more than 0.995 but less 
than 1.005, it would be recorded as 1.00. Therefore, the third class 
of Table 4 includes all measurements more than 0.995 and less than 
1.495. These values are then the true or closed limits of the third 
class and are known as class boundaries or end values. A class bound- 
ary is the value halfway between the upper limit of one class and the 
lower limit of the next class. For example, the upper boundary of 
the fourth class of Table 4 is 1.995 which is the lower boundary 
of the fifth class. If we denote the variate values by x, the 
following table illustrates these remarks for the first five classes of 
Table 4. 

Class Limits End~x Mid~x 

0 . 00 - 0.49 0.495 0.245 

0 . 50 - 0.99 0.995 0.745 

1 . 00 - 1.49 1.495 1.245 

1 . 50 - 1.99 1.995 1.745 

2 . 00 - 2.49 2.495 2.245 

The width of a class interval is the same, however, whether the 
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classes are expressed in terms of class limits or class boundaries, being 
the difference between the beginning of one class and the beginning 
of the next class. Similarly, the class mark as the mid-point of the 
interval is unaffected. Thus, for the class limits 1.00-1.49, the 
class mark is K^OO + 1.49) = 1.245; for the corresponding class 
boundaries, the class mark is J (0.995 + 1.495) = 1.245. 

The distinction between class limits and class boundaries is an 
important one in plotting graphs, but in tabulating it is the class 
limits that should be expressed. 

9. Rules for Making a Frequency Distribution, 

(1) Determine the range of the table by finding the difference be- 
tween the highest value and the lowest value among the items. 

(2) Determine the number of equal parts into which the range 
shall be divided. The size of the class interval and the num- 
ber of intervals depend upon the size and nature of the distri- 
bution. (Table 1 contains rather fewer classes than is usually 
desirable but an interval of 10 units is quite conventional in 
students’ grades. An interval of 5 would be used if grades 
of A, A — , B, B — , etc., were given instead of A, B, etc.) In- 
tervals of 0.5, 1, 2, 3, 5, 7, or 10 are the most common. 

(3) Arrange a sheet with three headings: class interval, tally 
marks, frequency. 

(4) Read off the items in the raw table and for each one record a 
mark, as shown in Table 2. 

(5) Write the sum of the marks in each row in the frequency col- 
umn. The sum of the frequencies should, of course, equal the 
total number of variates. 

10. Cumulative Frequencies. The frequencies with which we have 
been concerned may be called absolute frequencies to distinguish 
them from two other kinds which will be mentioned in this course; 
namely, cumulative frequencies and relative frequencies. The first 
of these will be considered here. 

Sometimes a statistical investigation is concerned with the number 
or percentage of variates which are “ less than ” or “ more than ” 
a given value. This is frequently the case in educational tests and 
in wage or salary statistics. Our chief interest in such cases may be 
the accumulated frequency of the several class intervals up to some 
class boundary. Hence we are led to form a cumulative frequency 
table. Such a table is built up by successively adding the several 
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(absolute) frequencies; thus: /i, fi + fi, /i + / 2 + f 3 , etc., as illus- 
trated in Table 7, where the data of Table 6 are used. We shall use 
N to denote the sum of all the frequencies. 

Table 6 — Distribution op Intelligence Quotients (IQ’s) of 905 School 
Children from 5 to 14 Years op Age. (Derived from 
L. M. Terman, The Measurement of Intelligence) 


IQ 

Number o. 

55- 64 

3 

65- 74 

21 

75- 84 

78 

85- 94 

182 

95-104 

305 

105-114 

209 

115-124 

81 

125-134 

21 

135-144 

5 


The cumulative frequency (cum f) at any class is the total (abso- 
lute) frequency up to the upper boundary of that class. This is the 
reason for placing the cum f entries opposite the end-x values and on 
lines between the mid-x entries. Thus, in the cum f column of Table 
7, three students had IQ’s less than 64.5, 24 less than 74.5, etc. The 

Table 7 — Cumulative Distribution of IQ’s (Table 6) 


Class Mark 
Mid-x 


Frequency 

f 


Upper Boundary 
End-x 


Cum / 


Cum f 
N 


59.5 

69.5 

79.5 

89.5 

99.5 

109.5 

119.5 

129.5 

139.5 


3 -/i 
21 =/ 2 
78 
182 
305 
209 
81 
21 
5 


54.5 

64.5 

74.5 

84.5 

94.5 

104.5 

114.5 

124.5 

134.5 

144.5 


0 

3 =/t 

24 — ft 4“ fi 
102 
284 
589 
798 
879 
900 

905 = N 


0.000 

0.003 

0.027 

0.113 

0.314 

0.651 

0.882 

0.971 

0.994 

1.000 


entries in the column headed (cum f)/N give the percentages of the 
total frequency which are less than the values of the end-x column. 
Thus, from this column in Table 7, we can readily see that 88% of 
the children had IQ’s less than 114.5 and only 11% less than 84.5. 
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Table 7 is known as a “ less than ” table. One could of course 
cumulate the frequencies from the bottom of the table, getting a 
“ more than ” distribution. The cum / column would then give the 
number of children whose IQ’s are more than the values at the lower 
boundaries of the several class intervals. 

The inverse operation to cumulating the frequencies is called 
“ differencing ” and is usually denoted by A (delta). If S denotes 
any series of values, then AS denotes the results obtained by sub- 
tracting the first value of S from the second value, the second from 
the third, etc. Differencing a column of cumulative frequencies 
obviously gives the absolute frequencies. Differencing a column 
of ( cum f)/N values gives the f /N values. 

Exercises 

1. What is the width of the class interval and the values of the class marks 

in Table 2? 

2. Tabulate the grades of Table 1, using class intervals of 5 units. 

3 . With reference to Table 3, is it easy to answer such questions as the following : 
(a) In how many instances are the monthly rainfall between 2 inches and 

3 inches? 

. (6) In how many instances was the rainfall less than 5 inches? 

(c) What was the smallest monthly rainfall recorded? 

(i d ) What per cent of the total measured between 5 inches and 10 inches? 
(e) What measurement is the most common? 

4. Refer to Table 4 and then answer the above questions. 

6. Using your own judgment as to the most appropriate class interval, make 
a frequency distribution of the monthly rainfall for Des Moines from 
1890 to 1925 (Table 5). 

6. For Table 6 state the class boundaries (end values) and the class marks. 

7 . Difference the cum f column of Table 7. 

8. Read the following references: 

(a) Mathematics Essential for Elementary Statistics — Walker, Chapter II. 
(h) Standards and Requirements in Statistics — - Belcher. Journal American 
Statistical Association, vol. 21, p. 424. 

11. Additional Distributions. The following distributions which 
will be referred to in subsequent chapters will serve as illustrative 
and laboratory material. They are not chosen on account of the 
importance of the data but merely to exemplify methods. 
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Table 8 — Distribution of Lengths of 


995 Telephone Calls. Time in Seconds 


Time 

Number of Calls 


0-99 

1 


100-199 

28 


200-299 

88 


300-399 

180 


400-499 

247 


500-599 

260 


600-699 

133 


700-799 

42 


800-899 

11 


900-999 

5 


(For future reference: x = 477.3 secs., a ~ 148.5 secs.) 


Table 9 — Distribution of 

Weight in Pounds Among 


1000 8-Year-Old Glasgow Schoolgirls 


Weight {mid-values) 

Frequency 


29.5 

1 


33.5 

14 


37.5 

56 


41.5 

172 


45.5 

245 


49.5 

263 


53.5 

156 


57.5 

67 


.61.5 

23 

1 

65.5 

3 

j 

Table 

10 


Twelve dice wore thrown 4096 times; only a throw of 6 was counted a success. 


The observed distribution follows: 



Successes 

Frequency 


Xi 

Si 


0 

447 


1 

1145 


2 

1181 


3 

796 


4 

380 


5 

115 


6 

24 


7 

7 


*8 

1 


9 

0 


10 

0 


11 

0 


12 

0 


(For future reference: 

x - 2, (7 = 1.296) 




1 

/ XJl 
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Table 11 

Twelve dice were thrown 4096 times; a throw of 4, 5, or 6 points being reckoned 
a success. The following distribution was recorded: 

Successes Frequency 


0 

0 

1 

7 

2 

60 

3 

198 

4 

430 

5 

731 

6 

948 

7 

847 

8 

536 

9 

257 

10 

71 

11 

11 

12 

0 


(For future reference: x = 6.139, a - 1.712) 


Table 12 — Frequency Distribution op the Weights of 1000 Male 
Students (Original Measurements Made to Nearest Half Pound) 


Class 

Pounds 

Class 

Mark 

Frequency 

Cumulative 

Frequency 

90- 99.5 

94.75 

2 

2 

100-109.5 

104.75 

21 

23 

110-119.5 

114.75 

104 

127 

120-129.5 

124.75 

196 

323 

130-139.5 

134.75 

248 

571 

140-149.5 

144.75 

197 

768 

150-159.5 

154.75 

133 

901 

160-169.5 

164.75 

47 

948 

170-179.5 

174.75 

25 

973 

180-189.5 

184.75 

14 

987 

190-199.5 

194.75 

7 

994 

200-209.5 

204.75 

4 

998 

210-219.5 

214.75 

0 

998 

220-229.5 

224.75 

0 

998 

230-239.5 

234.75 

1 

999 

240-249.5 

244.75 

1 

1000 

(For future reference: 

x = 138.65, c = 18.03, 

«3 =.94) 
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Table 13 — Distribution of Span (Central Values) in Inches Among 2000 
Adult Males (Original Measurements to the Nearest Inch) 


Span 

Frequency 

Span 

Frequency 

58.5 

1 

71.5 

217 

59.5 

2 

72.5 

176 

60.5 

1 

73.5 

132 

61.5 

6 

74.5 

82 

62.5 

7 

75.5 

48 

63.5 

22 

76.5 

20 

64.5 

55 

77.5 

16 

65.5 

111 

78.5 

12 

66.5 

146 

79.5 

3 

67.5 

182 

80.5 

1 

68.5 

229 

81.5 

2 

69.5 

265 

82.5 

1 

70.5 

263 

Total 

2000 


The following references are recommended to those who desire some distri- 
butions which may be more interesting in themselves: 

(а) Per cent Distribution of Deaths in Each Age Period, by Specified Causes. 
White Males and White Females, United States, 1942. Source: Metro- 
politan Life Insurance Company, Statistical Bulletin , October 1945, p. 7. 

(б) Age of American Military Leaders. Source: Metropolitan Life Insur- 
ance Company, Statistical Bulletin , June, July, August, 1945. 

(c) Employment Status of the Population by Age and Sex. Source: Popu- 
lation, Third Series, The Labor Force, Table 5, 16th Census. 

(i d ) Distribution of Population by Age. Source: Statistical Abstract , 1943, 
p. 24. 


CHAPTER II 

GRAPHICAL REPRESENTATION 

1. The Function Concept. Variables which are linked or related 
in some way are encountered in various fields of human experience. 
Several variables may be linked but we shall, for the present, con- 
sider the simple case where only two variables are involved. For 
example, the two related variables may be time and population, 
variate and frequency, rate of interest and accumulated principal, 
age and insurance premium. The primary purpose of a graph is to 
show diagrammatically how the values of one of two linked variables 
change with those of the other. One of the most useful applica- 
tions of the graph occurs in connection with the representation of 
statistical data. 

Underlying the intelligent use of graphs is the concept of function , 
which is a fundamental notion in mathematics and its applications. 
The mathematical meaning of function is a technical one, entirely 
different from the ordinary meaning. The student usually meets 
the word for the first time in algebra, when a linear or quadratic 
expression is spoken of as a function of x. An example is the equation 

y = P(1 + x) 2 . 

The expression on the right is the function of x ( P being constant) 
and for convenience it is denoted by the single letter y . Here x is an 
interest rate and y denotes the amount to which P dollars will accu- 
mulate in two years at x% per year. 

The statement that y is a function of x is written symbolically in 
the form 

y = f(x). 

This implies that a value of the function y is determined when a value 
is assigned to the variable x. For this reason, x is called the indepen- 
dent variable and y the dependent variable. In place of / other letters 
may be used. Thus, any one of the symbols 

g(x) } h(x), F(x), 

and so on, denotes a function of x. The same symbol may be used 
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to denote different functions in different problems, but different 
symbols are required to represent different functions in the same 
problem or discussion. 

Examples: 

fix) = 5x 2 -32 + 2, 

4>(x) = Ke ~ xl . 

Any mathematical expression involving a variable £ is a function 
of x. However, the word is often used to designate a relation that is 
completely divorced from any equation or expression. The central 
idea conveyed by this more general meaning is that of a correspond- 
ence between values of y and values of x. The following definition is 
the result of a development over a long period and its formulation is 
due to Dirichlet, a famous French mathematician (1805-59). 

Definition. Let there be a set of values assumed by the inde- 
pendent variable x. If to each x in the set , there corresponds one or 
more values of y , then y is said to be a function of x in the set 

It should be observed that this definition 1 is freed from any notion 
of the necessity of specifying the mathematical relation between x 
and y. We may or may not know the special method by which the 
correspondence is set up. A mathematical formula or equation be- 
tween x and y may not even exist. A function may thus be 
considered as being equivalent to a table in which one may look up 
any x of the set of the definition, and find the corresponding y. 

Much of the data in statistics comes under this general definition 
of function. Thus, in the following table, net earning is a function 
of the year, whether or not there ^ 
is any equation defining that 
functional relationship. 55 

Here the function is defined so 
only for the indicated points U5 
which correspond to the values 
given in the table. The straight 40 
lines are drawn to help the 35 / 
reader visualize the relative posi- 
tions of these values and not to represent the function at inter- 
mediate points. They may, however, be thought of as a first 

X A classical example is the function which is defined for the infinite set of 
numbers from x = 0 to x — 1 to be unity for all rational numbers and zero for 
all irrational numbers. 
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approximation to the unknown function between the given values. 
Such a representation of the function could not, of course, be 
assumed in the case of discrete variates because then the function 
is discontinuous and does not exist except for the given values. 

Referring a gain to the above definition, if there is only one value 
of y corresponding to each value of x then y is called a single-valued, 
function of x; otherwise y is said to be a multiple-valued function of 
x. Child weight would be an example of a multiple-valued function 
of age, being different for different children. The weight of a par- 
ticular child would be a single-valued function of age. For the most 
part we shall be concerned with single-valued functions. 

2. Charts. A detailed study of the technique of representing data 
by broken lines, by charts or bar graphs, etc., will not be undertaken 
here. It is a rather specialized and non-mathematical subject, and 
the student interested in plain-scale cartography can readily find 
books on the subject which are very readable. 1 (A discussion of 
ratio charts is given in Chapter VII.) 



0 1 2 3 4 5 67 8 9 10 11 12 

I Frequency Polygon for the Distribution of Table 10 
n Frequency Polygon for the Distribution of Table 11 


Fig. 1 — Frequency Polygons for Distributions of Discrete Variates 

3. Frequency Polygon. We present now a discussion of the 
graphs that are used in connection with frequency distributions. A 
1 For example, 

(a) Graphs: How to Make and Use Them — - H. Arkin and R. Colton. 2nd ed. 
Harper. 

(h) Engineering and Scientific Graphs for Publication . American Standards 
Association, New York. 

(c) Reference 8 in our Introduction. 
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distribution of discrete variates may be represented graphically by 
plotting the points Oi,/i), (x 2 ,f 2 ), ■ ■ . (x k ,f h ), and drawing a broken 
line through them. Such a graph is called a frequency polygon be- 
cause it is a polygon formed by connecting the tops of a series of 
ordinates whose lengths are proportional to the various frequencies 
and whose abscissas correspond to the variate values of the distri- 
bution. Figure 1 will serve as an illustration. For a table of dis- 
crete variates the function exists only for the given values. Like- 
wise, its graph is discontinuous. The straight lines connecting the 
points serve merely to “ carry the eye,” thus giving a better idea of 
the shape and position of the distribution. 



Fig. 2 — Histogram for Table 6 

4. Histogram. If the frequency distribution is one of grouped 
variates (discrete or continuous) it is better to use some form of 
graphical representation which recognizes the fact that the several 
measurements in a table do not lie precisely at the class marks but 
are spread out over the intervals of which the class marks are centers. 
This may be accomplished through the use of a histogram . A histo- 
gram is a series of rectangles erected at the class boundaries with 
altitudes proportional to the respective class frequencies, and cen- 
tered on the class marks. Thus the frequencies are represented by 
areas. (See Figure 2.) If the bases are all of unit length then the 
altitudes are also equal to the frequencies. The histogram is an 
important and useful graphical device for representing frequency 
distributions. 

5. Frequency Curves. The shape of the distribution may be 
emphasized by constructing a continuous frequency curve such that 
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the areas under the curve beween the ordinates at the upper and 
lower boundaries of the various rectangles will equal approximately 
the areas of the corresponding rectangles. Thus, in Figure 3, the 
area of all the rectangles represents the total frequency 1000, and the 
area of the three rectangles labeled A, B, C represents the number 
of individuals weighing between 139.75 pounds and 169.75 pounds. 
The dotted line represents roughly the frequency curve correspond- 
ing to the histogram. 

Representing each class frequency of a distribution of continuous 
variates by a rectangle is equivalent to saying that we realize that 


250 

200 

5150 

z 

Ul 

» 

Or 

u 

£100 

50 


0 


Fig. 3 — Histogram and Frequency Curve for a Distribution 
of Continuous Variates 

the function exists for points other than the class marks, but we do 
not know what it is for these points, and so as a first approximation 
we assume that the variates are uniformly distributed over each 
interval, which is equivalent to regarding them as concentrated at the 
class marks. If the class intervals were made smaller and smaller 
and at the same time the number of variates were proportionally 
increased, the upper bases of the rectangles would approach more 
and more the frequency curve which represents the ideal or theoreti- 
cal mathematical function relating frequency with variate value 
for the given distribution. 

A frequency curve is often drawn for convenience in describing 



sal ,1 I ■ I - 1 BM-J-1 1. I .. Ll-J-I- 

99.75 119.75 139.75 159.75 179.75 1,99.75 219.75 239.75 

WEIGHT IN POUNDS 


Frequency Distribution of the Weights of 1000 Male Students (Table 12) 
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the properties of an observed distribution, although strictly speak- 
ing, the concept of a frequency curve is applicable only to an infinite 
“ universe ” of continuous variates. The data at hand are supposed 
to be a “ sample ” from the universe represented by the frequency 
curve. 

The more common types of distributions may be represented by 
bell-shaped curves which are either symmetrical or skew. For ele- 
mentary purposes it is sufficient to consider frequency distributions 
as of these two general types. In passing, we may also mention two 
other types which are known as J-shaped and U-shaped. For ex- 
amples of these types see Yule and Kendall, An Introduction to the 
Theory of Statistics, Ch. VI. 



Fig. 4 — Ogive for Table 7 


6. Ogives. The graphs of cumulative frequencies are called 
ogives . The ogive for Table 7 is shown in Figure 4 and is constructed 
by plotting the points (54.5, 0), (64.5, 3), etc., as in algebra, and 
joining them with straight lines. 

The student should observe that while cum f is a function of x it is 
defined for the end-x values only. Occasions will perhaps arise when 
we desire the x-value corresponding to some intermediate cum f 
value, say 453 in Figure 4. Conversely, we might wish to know the 
cum f value for some intermediate cr-value, say at x = 97. Strictly 
speaking, we do not know the answer in either case, inasmuch as we 
do not know how the IQ’s are distributed over the interval. Per- 
haps all the individual values in the interval 94.5-104.5 (say) are 
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less than 97; perhaps none are. The fairest assumption we can make 
is that they are uniformly distributed throughout the interval This 
means graphically that we represent cum f over each interval by a 
straight line, as is done in Figure 4. We may now interpolate under 
this line for intermediate values. This is “ straight line interpola- 
tion " and is what the student uses when he interpolates in logarithms. 

More refined methods exist for interpolating values of a function 
between the observed values but their study constitutes a separate 
branch of mathematics beyond the scope of this course. It should 
be observed that the straight line used here is a first approximation 
to the unknown function, and not merely a device to carry the eye 
as in the case of a frequency polygon for a discontinuous distribution 
of discrete variates. 

7. Relation of Cumf to Areas. The sum of the frequencies (< cumf ) 
up to any value of x means, graphically, the sum of the areas of 
the rectangles of the histogram up to that value. Thus in Figure 4, 
the ordinate erected at x = 84.5 represents the sum of the frequencies 
(3 + 21 + 78) = 102 (Figure 2). If a frequency curve represents 
the distribution, then cum /, corresponding to any value of x, is the 
area under the curve up to that value. Thus, in Figure 3, cum f 
corresponding to x ~ 139.75 is approximately the area under the 
smooth curve up to x = 139.75, and the total area under the curve 
is cum f — N, 


Exercises 

1* (a) If f(x) = 2x z exhibit /(—&). Give the value of/(3), of /( — 2). 
w (b) Let f(x) denote a given function which is defined for all real values of 
x under consideration so that if c is any admissible number /(c) is 
defined. What is the graphical meaning of /(c)? 

2. If 4>(x) = Ke~*~ y (a) show that <f>( x) — <f>(—x); (6) give the value of 

3. If h(x) = ax 2 + bx ~f- c, and h(x) = h(—x), show that 6 = 0. 

4. If f(x) - a x , show that f(u) Xf(v) - f(u -j- v). 

6. If g(x) = log { (1 - x)/\l + a:)}, show that g(u) + g(v) = 
g{(u + v)/(l + uv)\. 

6. Make a histogram for the data of Table 4. 

7. Same as exercise 6 for Table 8 or 9. 

8. Construct an ogive for the cumulative frequencies given in Table 12. 

9. Find the cumulative frequencies and construct the ogive for Table 9. 

10. For further discussion of ogive curves and their uses, read the following 
references: 

(а) Elements of Statistics — Davis and Nelson, pp. 23-28. 

(б) The Mathematics of Statistics — Burgess, pp. 61-72. 
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1, It was pointed out in Chapter I that classification of the vari- 
ates of any long series is the first step necessary to overcome the 
confusion of detail in the original observations, and to make compari- 
sons with other distributions possible. In Chapter II graphical 
methods were studied which describe, to some extent, the shape and 
position of the distribution. Although these methods are helpful, 
their contribution is largely qualitative. 

It is desirable to formulate quantitative descriptions for character- 
izing a distribution, and as an aid in this direction averages are very 
useful. They are also called measures of location. An average is a 
quantity locating a central value of the distribution. In a sense, 
it is a typical value of the whole set of variates, although it is not 
necessary that it actually have the value of one of the items of the 
set it represents. There are five averages in common use. These 
are: arithmetic mean , mode , median, geometric mean , and harmonic 
mean. The means and median are most frequently used although 
the arithmetic mean is by far the most important in general statis- 
tical work, and the others are of service in special cases. We will 
consider them in the order named. First, however, it will be desir- 
able to discuss certain symbols and notation which will facilitate the 
development of formulas. 

2. Notation. If x denotes a variable, then x h x 2 , * • •, are 
general symbols for the values which x may take. When we are con- 
cerned with a sum like the following, 

Xi + + Xz + x* + * * • + Xi + • * * + x^ ) 

it is customary to designate it by placing the Greek capital letter 
(sigma) before the general term, thus 
N 

'ZsXi = xi + z 2 H hxi H h x N . 

i 

The symbol is a sort of mathematical verb and the notation 
written above and below it may be called adverbs. Mathematicians 

29 
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call X) an operator and speak of the “ adverbs ” as limits. When 

N 

is placed before any quantity, it means, “ add up all quantities 

1 

like • • • which are formed by giving i the values of every positive inte- 
ger from i = 1 to i- N, inclusive.” Thus if x, stands for “ variates ” 
in Table l, Xi refers to the first value 75, x 2 refers to the second value 

80, etc., and refers to the last value 56. Here N = 100. Hence 
100 

the compact notation 23 a: * denotes the sum of all the variates in 

i — 1 
N 

Table 1. The symbol 23^* is rea d, “ the summation of x-sub-f, i 

l—l 

varying (or running) from one to N” The subscript i is called the 
index of summation. Any letter may be used as an index but it is 
conventional to use i or j. Also the upper limit may be denoted by 
any letter but we shall use N to denote the total number of variates 
(some of which may be alike) in a set. 

If a variable x is to take on the particular values, 1, 2, 3, etc., 
instead of the general values x h x 2 , x h etc., then x itself becomes the 
index of summation and we write x = 1 underneath 23 • Thus 

f> = l + 2 + 3 + -- -+W, 

X=1 

N 

2> 2 = 1 + 2 2 + 3 2 + • • • + N*. 

X~1 

Frequently the index of summation is understood from the context 
and the notation at the top and bottom of 23 may be omitted if no 
ambiguity results. 

It is imperative that the student master, as soon as possible, the 
significance and utility of the 23 notation. 

Illustrations: 

N 

1. — 3xi + 3£2 + 4 • • + 3xtf 
1 

= 3(#1 + #2 + ■ > • + Xjy). 

5 

2. Z(x< + C) — (#1 + c) + (#2+ c) + (#3 + c) 

i*«l 

+ (Xi + c) + (x 5 + c) 

— (xi + x 2 + xz + Xi + £5) -f- 5c. 

4 

2 ^*/* = x ifi + £ 2/2 + Xzfz + £ 4 / 4 . 
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4. 'jjcj, Vi - Xiyi + Xiy* + x 3 y 3 + x 4 y 4 . 

3 =1 

5. i> 2 = l 2 + 2 2 + 3 2 + b N\ 

14 = 1 

The following simple theorems will be useful in our work. 
Theorem I. The summation ^ ' of an algebraic sum of two or more 
terms is the same algebraic sum of the ^’s of these terms taken separately. 
In symbols: 

W N N N 

+ y< - zO = 2>< + - X>,-. 

i — 1 4 = 1 i = 1 is=l 

Theorem II. A constant factor may be removed from under the 
summation sign and written outside as a factor. Thus , 

N N 

JjCXi = c£xi. 

4=1 1 = 1 

Proofs: It is left as an exercise for the student to prove these two 
theorems by expanding the expressions. 

w 

Theorem III. If the expression under ]T) is a constant c, the expanded 

4 = 1 

result is Nc. 

Examples: 

N 

1 . £c » C + C + -- +C = Nc. 

4 = 1 

IV iV V 

2. — c) = by Theorem I 

4 = 1 1=1 1 = 1 

N 

= ~ Nc, by Theorem III. 

i=i 

The above theorems hold also if we replace the notation 

JV N 

by ]EX etc. 

i=l x~l 

The next two theorems have to do with summing integers. The 
numbers used in counting, 

1,2, 3, 4, 5,... 

are called integers or natural numbers. 
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Theorem IV. The sum of the first N integers is 

N(N + 1) 


In symbols: 


AT 

2 > = 
X =1 


2 

N{N + 1) 


This result follows from the fact that the integers form an arithmetic 
progression. 

Theorem V. The sum of the squares of the first N integers is 
N(N + 1)(2N + 1) 


In symbols: 


6 


N(N + 1)(2V + 1) 

2Lj X ~ a 


Proof: Let us take the identity x* ~ (x — l) 3 = 3x 2 — 3.x + 1, 
and sum each side for x = 1 to N. Thus, 

- O - l) s ] = Xt 3a;2 - 3a: + 1], 

S = 1 3=1 

Applying Theorems I— III to the right member we have 
2> 3 - (* - l) 3 ] = 3X> 2 “ 3]£s + N. 

X=1 x—l X=1 

Performing the indicated sum in the left member, we have 



Therefore N s = 3X£ 2 ~ 3Xa: + N. 

Hence, using Theorem IV and simplifying, 

& . 2 N 3 + 3 N(N + 1) - 2 N 
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whence f^.W+JW+J). 

x^i 6 

3. Arithmetic Mean. The arithmetic mean of a set of variates is 
defined as the sum of the variates divided by their number. We are 
thinking now of a set of ungrouped variates, like that of Table 1. If 
we use the symbol x to represent the arithmetic mean of the N 
variates x h xi, x s • * *, x^, then 


% — ■ (%i + X 2 + Xs + * * * 4 “ %n), 


or using the more compact notation of the preceding section, we have 


(1) 


l N 

* ~ 7r 2^* 
£ = 1 


Each item in the set is thus represented in the arithmetic mean in 
proportion to its magnitude. 

As an illustration, it is easily verified that for the set of grades given 
in Table 1, 


_ 7267 

x = 

100 


72.67. 


Computing the mean 1 strictly according to definition (1) may be 
called the serial method to distinguish it from other methods which 
will be presented. This definition is applicable when N is so small 
that a grouping of the variates into a frequency distribution is not 
feasible. 

If x refers to the integers from 1 to N their mean is 


(la) 


1 N 


4. Weighted Arithmetic Mean. It will be noticed that several of 
the grades given in Table 1 are alike. For example, 80 occurs seven 
times. It should be evident that the same result would be found for 
the mean if, instead of summing the individual values, each value was 
first multiplied by the frequency with which it occurs and all such 
products were then added. In general, if the values x h x 2f * * *, Xk 
occur with corresponding frequencies f u / 2 , * * ■, /*, respectively, 

1 When there is no ambiguity, the arithmetic mean is often referred to as the 
mean. 
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where /x + ft + • • • + fu — N, it follows that 

- _ ‘Eih ± ' 1 ~ Xk f k 

X ' /!.+ /» + ••■+/* 

or, in shorter notation 

J k h 

(2) x = -r.YjfiX* wIiere N = £/<• 

iv x i 

When obtained in this way, x is generally called a weighted arith- 
metic mean. The term originated in experimental science where 
some readings which have been made under more favorable conditions 
are “ weighted ” according to their reliability or importance. When 
the weights have been chosen, they become, essentially, frequencies. 

If the %’s are added individually, the f s become unity, and equa- 
tion (2) reduces to (1). The student should notice that, for the 

k N 

same data, XJjfe is numerically equal to xi , He should also 

i i 

observe that N refers to the number of variates in the set (some of 
which may be alike), whereas k refers to the number of different values 
of x in the set and hence to the number of products of the form xffi 
where is the number of times Xi occurs. In the following example, 
N = 8 and k = 4. 


Example . For the values 6, 8, 7, 6, 5, 7, 6, 5, 

8 

^2xi — xi -f X 2 4* £3+ £4+ + £6 + £7 + %8 — 6 + 8 + 7 + 64-5 + 7 

+ 6 + 5 = 50. 

4 4 

2L,f&i ~ /i®t 4- fzx 2 4- toz 4- fiXi = 2*5 4- 3*6 4* 2*7 4- 1*8 = 50. £/» = 8. 

*- 1 * =1 

By either method, x = 50/8 « 6.25. 


Exercises 


1. Write in expanded form: 

k k 

(«)3 !></<; Q>) 

1 

2. Write in expanded form: 


(a) 


m 4*»* 

® X /« 

*®»i41 


to 23 to — $)/*. 

i~l 


to E^/i 4" 23 xffu 

. i~l i — m 41 


3. Express 2(c) as a single summation, if ni + n 2 ~ fc. 


Sec. 5 Arithmetic Mean from Frequency Table 35 

4. Write in the abbreviated form, using 23 ; 

(а) xifi 4 £ 2/2 4 * • * 4* Xkfk. 

(б) fe - x)fi 4 (x% - 2B)/a 4 h (x k - 5)/*. 

(c) ^ [fe ~ ^) 2 /i + fe - 2) 2 / 2 4 h (<e* - z) 2 /*]. 

5. Prove: 

ft ft ft 

(a) 23 fe 4 l) 2 /; = 23^ 2 /*' 4 4 N. 

1 1 i 

(b) Y,x(x — l)p = 23^(2 ~ l)p. 

as =0 x =2 


6. Compute the value of exercise 1(c) for the example in §4, using the following 
form: 

Xi fi (Xi ~ X) (X{ — $)/< 

5 2 -1.25 -2.50 

6 3 

7 2? ? 

8 1 

Zfe - *)/< = ? 
v S' N \ / N \ 

7. Distinguish between 23^2/* and ( 23^* ) ( YjVi ) * Write in expanded form. 

i = l \i = l / \i = i / 

8. (a) Express in 23 notation: Each different variate is multiplied by its own 

/ and the sum of the results is divided by N. 

(6) Give word statements of the expressions in Exercise 4. 

(c) Express the general polynomial of degree n in x , 


do 4- aix 4- a 2 x 2 4 4 a n x n , 


in 23 notation. 

9. Using the identity 

derive the result 


x 2 - (x - l) 2 * 2x - 1 

«; jy(jy + 1) 

2-/^ 2 

x = 1 " 


by a method analogous to the proof of Theorem V. 

10. (a) Express in abbreviated notation: The sum of the squares of the z’s 
divided by the square of their sum. 

(5) If x refers to the integers from 1 to N, evaluate your answer to (a) in 
terms of N. 

(c) Show that the mean of the first N integers is (N 4 l)/2. 


6. Arithmetic Mean from Frequency Table. The variates in each 
class interval of a frequency distribution are assumed to have the 
value of the class mark for that interval. Therefore, we may use 
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formula (2) to find the mean of a frequency distribution. In this 
case, Xi represents the mid-value of the ith class interval, /,• the corre- 
sponding frequency, and k the number of intervals; i running from 1 
to k. The method of applying (2) is illustrated in Table 14 from the 
data of Table 2. 


Table 14 


Class 

Interval 

Class Mark 

X 

Frequency 

f 

Product 

fx 

30-39 

34.5 

2 

69.0 

40-49 

44.5 

3 

133.5 

50-59 

54.5 

11 

599.5 

60-69 

64.5 

20 

1290.0 

70-79 

74.5 

32 

2384.0 

80-89 

84.5 

25 

2112.5 

90-99 

94.5 

7 

661.5 

Totals 

| 

E/ = ioo 

£/x = 7250.0 


7250 

2 = — = 72.50. 


If we denote the class interval by c then it is obvious that c = 10 in Table 14. 


In this connection it is interesting to note that our result here 
differs very little from the true value 72.67 and therefore our assump- 
tion that all values in a given class may be taken as the class mark 
seems to cause little error in the result obtained for the mean. This 
can be proved mathematically (under certain assumptions) and will 
be referred to later. 

6. Translation of Axes; Deviations. It is frequently useful to 
employ the methods and results of geometry in connection with the 
problems of statistics. Foremost among these methods is the repre- 
sentation of numbers by points on a line; an origin and a unit of 
measure having been chosen, a coordinate is assigned to each point on 
the line. When a frequency distribution is represented by a graph, 
we have seen in Chapter II that the variate values are used as abscis- 
sas or measurements along the 2 -axis. The mean is therefore the 
point on the 2 -axis whose coordinates are ( 2 , 0). Its position may 
be emphasized by drawing a vertical line through this point, but it is 
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the horizontal distance of the point from the origin and not the 
vertical line which represents graphically the mean. 

In discussing the variates we may often work with smaller numbers 
by changing the origin of reference. If new axes, x'y', are taken 
parallel to the old axes, xy, with positive directions preserved, the 
axes are said to be translated from one position to the other. A trans- 
lation of axes corresponds to a transformation of coordinates. Thus if 
we let 

x' = x - xo, y' = y - y 0 

the origin is translated to the point (x 0 , Vo). Since the variates are 
denoted by x we are concerned here only with the transformation 
x f = x — Xq which translates the origin to the point (x Q , 0). The 
variates referred to a new origin are often called deviations . In 
particular if we translate the origin to the mean by letting 

x r — x — x, 

then for a frequency distribution the deviations are the values 
obtained by subtracting x from each of the class marks. Thus, 

Xi = Xi — x 
X 2 — x 2 ~~ x 


Xk = Xk — x. 

The units of measurement remain unchanged. Figure 5 shows the 
two systems when the axes are translated to (x, 0). Obviously, any 



variates that are larger than x will be positive in terms of x f and any 
variates smaller than x will be negative in terms of x'. 
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7. Properties of x. There are two important properties of x 
which may be stated in the following theorems: 

Theorem VI. The algebraic 1 sum of the deviations of all the variates 
from their arithmetic mean is zero . 

Proof: Let x' represent a deviation from the mean. Multiplying 
each different deviation by the number of times it occurs and adding 
these products we have, 

k k 

Zm-' = Z/.-c*:,- - z) 

1 1 

= ZM- “ Z f&> by Theorem I 
1 1 
k k 

= b y Theorem II. 

i i 

k k 

Recalling from (2) that ^ fiXi — Nx, and that = N, we have 
i i 

(3) - x) = iVx- *2V = 0. 
i 

Theorem VII. If the variates are referred to a new origin xo and 
expressed in units of c by means of the transformation 

* x — Xo . 

(4) u = , (c 5 ^ 0), 

c 

then the old mean , x, is related to the new mean , u, by the following 
formula: 

(5) x = cu + x 0 . 

Proof: From (4), 

(4a) x = cu + xo 

and substitution of this value for x in definition (2) gives 

1 k 

5 = — Z/i( CM < + 2o). 

iV i 

By Theorems I and II this equals 

1 That is, taking account of signs. Some of the deviations will be positive and 
some negative. 
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But the first of these expressions is, by definition, c times the mean 
value of u, and the second is, from (2), simply x 0 . Therefore 

X — CU + Xq. 

This is an important relation and its derivation should be mastered. 
Observe that the size of a ^-unit will be c times as large as the size 
of an a>unit. 

Corollary. If the mean of the deviations of the variates from any 
arbitrary number , x 0) is found and added algebraically to x Q) the result is 
the mean x. In symbols , 

1 k 

(6) x = — - x 0 ) + Xo> 

iV 1 

The proof follows from (4) and (5). 

In (5) and (6), xo may be regarded as a provisional mean, and the 
first term in the right members may be thought of as the correction 
to be added algebraically to the provisional mean in order to get the 
true mean. 

8. Short Methods of Computing 3c. In certain cases, the method 
of computing the mean by (2), as shown in Table 14, can be simpli- 
fied by use of Theorem VII. 

Case I ( class intervals equal). If the class marks are equispaced, 
let c equal the class interval and choose x 0 as one of the class marks, 
usually the one opposite the largest frequency. From (4), x 0 becomes 
the origin of u , because when x — x 0 , u = 0. 

The method of using (5) is illustrated in Table 15, page 40. Here 
c = 10 and we choose x Q = 74.5, so (4) becomes 


u = 


x - 74.5 
10 


Substituting the given values of x in this relation we get the values in 
the u column. So in running the fu column, small values of u are 
multipliers of the larger values of /. Then 




-20 

100 


= -. 2 , 


so from (5), 


x = 10( — .2) + 74.5 = 72.5%. 
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It should be evident that the final value obtained for the mean is 
independent of the choice of the arbitrary value x 0 . This choice is 
only a rough guess and it is really immaterial which of the given 
values is selected as x 0 , except that the nearer it is to the mean the 
lighter will be the calculations to follow. A check on the arithmetic 
may, therefore, be effected by selecting a different provisional mean. 


Table 15 — Mean of 100 Grades Using Class Interval as Unit 


X . 

u 

/ 

ju 

. ' . . ■ ■ . • 

34.5 

-4 

2 

- 8 

44.5 

-3 

3 

- 9 

54.5 

—2 

11 

—22 

64.5 


20 

-20 

74.5 

0 

32 

0 

84.5 

1 

25 

25 

94.5 

2 

7 

14 

Totals 


100 

-20 


This indirect method is sometimes called coding because the vari- 
ates are coded to another scale in which it is easier to compute the 
mean. Formula (5) is the relation, then, for transforming the mean 
from one scale to another. 

If one's statistical interests are limited to computing means, then 
(2) cannot be improved upon if calculating machines are to be used. 
It should be understood, however, that techniques must be devel- 
oped now for subsequent purposes. The indirect method is part of 
a pattern which is useful in later chapters. From this standpoint, 

k 

one should practice using it at this stage when N = large 

i 

and the #’s are equispaced. 

Case II (class intervals unequal). Occasionally a frequency dis- 
tribution is encountered in which the variates are not equispaced; 
it is then usually best to take c = 1 (unless the x’s have a common 
factor c) and be content with whatever simplification results from a 
suitable choice of Xq. This is equivalent to using the above corollary. 

In Table 16, we choose x 0 ~ 200 and are thus able to simplify the 
work a little. (See page 41.) 
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Table 16 


X 

/ 

u 

uf 

106.12 

7 

- 93.88 

- 657.16 

191.83 

14 

- 8.17 

- 114.38 

246.48 

32 

46.48 

1487.36 

283.63 

49 

83.63 

4097.87 

257.65 

55 

57.65 

3170.75 

294.51 

54 

94.51 

5103.54 

222.53 

35 

22.53 

788.55 

71.43 

14 

— 128 . 57 

. -1799.98 

Totals 

260 


12076.55 


u « « “E/ife “ 200) 


12076.55 

260 


- 46.448 


5 = u + £o — 246.45. 


9. Geometric Explanation. Let us consider further the relation 
between the variables x and u, defined by the expression 


(4) 


u 




-4 


■4- 


Xo 


X“Xo . 


A geometric explanation will be 
helpful. 

Graphically, the x values are 
distances along the x-axis meas- 
ured from zero as origin. Like- 
wise Xq is some point on the 
2 >axis at a distance of x 0 units 
from zero. If now the points 
representing the x values are 
measured from x 0 as origin 
they are denoted by x — x 0 . 

(See Figure 6.) Thus if xo — 24, a value which is 36 with reference 
to the origin of x will be 12 with reference to x Q ; likewise a value 
x = 18 becomes x — x Q = —6 when referred to x 0 as origin. It 


Fig. 6 
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should be noted that x — x Q is in the same units as x. Thus if a: is in 
inches, x — xo will also be in inches. But (x — xo)/12 would then 
be in feet. Instead of dividing by 12 suppose we divide by c. Then 
(x — xo )/c will be in units of c whatever c may be. It is convenient 
to denote the resulting values by a different letter, say u. There- 
fore the numerator of (4) changes the origin of reference but does 
not affect the scale of measurement. The denominator changes the 
scale, there being c of the x units in one of the u units. Relation (4) 
has this generalized meaning apart from statistics. Mathematical 
notation is applicable to many different fields of knowledge. A rela- 
tion like (4) which occurs in physics is C = (5/9 )(F — 32); it con- 
nects temperature on the Centigrade and Fahrenheit scales. 

When (4) is applied to a frequency distribution it is convenient to 
select xq as one of the mid~x values and to take c as the width of the 

class intervals. Under Case I, the 
mean is found with reference to Xq 
and in units of c. This is the mean, 
u, of the numbers representing the 
various class intervals weighted 
with the corresponding frequencies. 
After this mean is computed it may 
be converted back into units of x 
by multiplying by c, and then re- 
ferred to the origin of x by adding x 0 . (See Figure 7.) Hence we 
have x — cu + x 0 . Thus we arrive at the same result as that 
obtained algebraically. 

If we had denoted the variates by y we could have used the relation 

V - Vo 

v = 

c 

corresponding to (4). Geometrically, this would mean a change of 
units and a translation of origin in the ^-direction. The relation 
corresponding to (5) would then be 

y = cv + 2/0 

where v = — 

As the short-cut method is an important one, another illus- 
tration is given in Table 17 (based on Table 4). Here we take 
% - (x - 2.745)/0.5 - 2(x - 2.745). 


t. 

U in units of C 

S ... .... . f S i v 


Xo 

CU in original units 

X 


Fig. 7 — If xq < x, cu is positive; if 
xo > x, cu is negative. 
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Table 17 — Computation of Mean Monthly Rainfall at Iowa City 

1890-1925 


X 

/ 

u 

fu 

0.245 

23 

-5 

-115 

0.745 , 

42 

-4 

-168 

1.245 

58 

—3 

-174 

1.745^ 

02 

-2 

-124 

2.245 

49 

-1 

- 49 

2.745 xo 

47 

0 

0 

3.245 

32 

1 

32 

3.745 

27 

2 

54 

4.245 

18 

3 

54 

4.745 

15 

4 

60 

5.245 

14 

5 

70 

5.745 

7 

6 

42 

6.245 

10 

7 

70 

6.745 

5 

8 

40 

7.245 

6 

9 

54 

7.745 

5 

10 

50 

8.245 

3 

11 

33 

8.745 

2 

12 

24 

9.245 

5 

13 

65 

9.745 

0 

14 

0 

10.245 

1 

15 

15 

10.745 

1 

16 

16 

Totals 

432 


49 


x = 2.745 + 

(0.5) (49) 



432 



= 2.802 inches. 



10. Mean of Means. So far we have used subscripts to distin- 
guish between the variates within a set: Xi, x 2 , * * *, x By this time 
the student should be thinking easily in this notation so we may now 
state an additional use of subscripts. Instead of using x and y to 
distinguish between two sets of variates we may use Xi and x 2 . Then 
to distinguish the variates within a set we would add a second sub- 
script, so for the xi set the variates are 


Xiij 3/12} *Ti3, } 3/ln.j 
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and for the x% set the variates are 


#22, #23? • * * , #2 n 2 . 


These are read “ p two one/’ etc., not “ x twenty-one/ 7 etc. In the 
notation dealing with one set, x was a variable but x h x 2 , etc., were 
constants. Now xi and x 2 are variables and x Ul xi 2 , • • • , x 2h x 22l • * * , 
etc., are constants. Thus x\ and x 2 may denote the grades of two 
sections of mathematics in which there are ni and n 2 students respec- 
tively. Then the mean of the first set is 


(a) 


1 711 

Xl = — JjCu 

Mi i — 1 


and the mean of the second set is 


( 6 ) 


1 712 

Xi = — ^JCa. 

n 2 i = i 


We will now state a useful theorem. 

Theorem VIII. If the mean of a set of n\ variates is xi and the mean 
of another set of n 2 variates is x 2) the mean x of the combined sets is 


( 7 ) 


x = 


n{x i + n-iXt 

N 


where N = ni + n 2 . 

Proof: It is obvious from equations (a) and (b) that 

ni 712 

(c) nixi + n 2 x 2 = YjX u + 


If x is allowed to stand for x\ and x 2 in succession as shown in the 
table on page 45 then the right member of (c) may be written 

m+nj 

Y Xi which denotes the sum of all the variates when they are 

i 

combined into one set. If this latter sum is divided by the total num- 
ber of variates N the result is, by definition, their mean. Hence 


712 

— i it* ' % y'j X 2 i 

n\X\ “p n 2 x 2 i i i 

ni + n 2 ni + N X ' 

We may express (7) in more compact notation as follows. 

^ 2 2 

X = — Y n iXi, N = Y n <- 

Jv i=l i = i 
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This form lends itself to a generalization for k sets so we have the 
following theorem. 

Theorem IX. The mean of a set of N variates which is composed of k 
subsets is 


( 8 ) 


x = 


1 k 

iv£ n<Ji 


where Xi is the mean and Ui is the frequency in the ith subset and 
N = 

t=l 

Corollary. If Ui = n is the same for all the sets , then N = kn and 

(8) reduces to 

1 k 

( 9 ) x = - Yjxf. 


Exercises 

L (a) Use (1), §3, to find the mean of the following numbers: 18, 42, 23, 16, 
103, 61, 49, 95, 113, 10. 

(6) For the numbers in (a) verify that the sum of their deviations from their 
mean is zero. What theorem does this exercise illustrate? 

2. Find the deviations of the numbers in Ex. 1 from 50 and verify that the mean 

of these deviations added algebraically to 50 gives the mean of the numbers 
themselves. 

3. Prove: The sum of the deviations of the variates from their mean is zero. 

4. Derive the relation x = cu + xo. 
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6. Find the arithmetic mean of the weights of 1000 students given in Table 12. 
Use (5). Ans. 138.65 lbs. 

6. Find the mean monthly rainfall at Des Moines from 1890 to 1925, using the 

frequency distribution which you previously made. Ans . 2.55 inches. 

7. Find the mean of the distribution of discrete variates given in Table 11. 

8. Prove the following theorem: The mean of a set of variates is unchanged if 

each variate is replaced by the mean of all the variates. 

9 . (a) Prove expressions (8) and (9). 

(6) The mean grade of one class of 20 students is 76% and of another class of 
15 students is 80%. Find the mean of the two classes. 

10. The record of freshman scholastic averages for a semester at a certain uni- 
versity were given as follows: 



n» 

Xi 

Men 

501 

3.550 

Women 

356 

3.639 


Find the mean grade for the entire class. 

11 . Assume that the following fictitious data represent the earnings per week of a 
certain type of machine shop labor in Illinois establishments: 


Wage Group 

Frequency 

$00.0 

Under $10.0 

50 

10.0 

20.0 

150 

20.0 

30.0 

400 

30.0 

40.0 

200 

40.0 

50.0 

160 

* 

* 

* 

60.0 

80.0 

40 

Total 


1000 


•Class omitted. Note the different interval in the last class. 

The average earnings per week for this same type of labor in all other states of 
the United States where 9000 men are employed, not counting those in Illinois, are 
$30.00 per week. 

Compute the arithmetic mean wage (a) for Illinois, (b) for the entire United 
States. 

Recompute the mean wage for Illinois in such a manner as to check, in the 
quickest and surest way, the accuracy of the result found in (a) above. 

12 . Find the mean of the following distribution: 


X 

/ 

47.5 

7 

48.1 

17 

45.9 

46 

44.0 

44 

40.7 

54 

41.6 

43 

38.0 

35 

33.2 

14 
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11. The Mode, That value of the variable which occurs most 
frequently is called the mode . Its chief service is in characterizing 
a type and it is the kind of average meant by such a phrase as the 
“ average man.” There is some difficulty in giving a precise defini- 
tion of the mode without more advanced mathematics. However, 
we may say that for a given grouping an approximate value, which 
we will call the empirical mode, is given by the class mark having the 
largest frequency. 1 Thus, in Table 17 the empirical mode is 1.745 
inches. 

12. The Median. Instead of finding the mean, suppose the N 
variates are arranged in the order of their magnitude. The median is 
defined as the value which is greater than half the variates and less 
than the other half. A more precise definition is as follows: 

Let Xi, X 2 , * • * , Xn be a set of real numbers, which may or may not 
be all different and suppose they are arranged in order of magnitude 
so that 

Xi g X2 ^ Xz ^ • • * S Xtf. 


Whenever N is odd, N = 2k — 1, the median is Xk, the middle one of 
the x’s. If N is even, N = 2k, the median is not uniquely defined 
unless Xh = Xk+ i, in which case the median is this common value. 
Otherwise, the definition is satisfied by any value of x belonging to 
the interval 

Xk ■= X = Xk+ 1, 


and the -median is to this extent indeterminate, 
conventional to take 


as the median. 


2 ( [Xk + Xk ~\- l ) 


In this case it is 


Example. Find the median of the following set of numbers: 10, 6, 5, 25, 15, 18, 
20 . 

Arranging them in order of magnitude we find the median to be 15 (the mean is 
14.14). If we add another value, 37, to make N even, the median is i(15 -f 18) = 
16.5 (the mean is 17). 

13. Median of a Frequency Distribution. Case L For a fre- 
quency distribution of continuous variates, the median is defined as 
follows: 

Definition: The median is the value of x for which cumf = N/2. 

Given such a frequency distribution we may therefore find its 

1 Another method of computing the mode will be given in a later section. 
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median by forming a cumulative frequency table and interpolating in 
the end-x column for the value of x corresponding to N/2. 

The method should be clear from the following illustration. 



> 

Median 


Example, Find the median for the data of Table 2. 


Interval 

/ 

End-x 

Cumf 



29.5 

0 

30-39 

2 

39.5 

2 

40-49 

3 

49.5 

5 

50-59 

11 

59.5 

16 

60-69 

20 

69.5 

36 

70-79 

32 

<- Md 

<-50 



79.5 

68 

80-89 

25 

89.5 

93 

90-99 

7 

99.5 

100 


Here, N/2 = 50. This value of cum f corresponds to a value of x 
in the interval 69.5-79.5. Therefore the median is 69.5 plus a frac- 
tion of the distance from 69.5 to 79.5. Thus, 



End-x 

Cumf 

Di 

■ a r 69 - 5 

^Median 

36 "b 

50 j 2 

d 2 


79.5 

68 



Assuming that the items in any class interval are uniformly distrib- 
uted over that interval, it follows that the partial differences are 
proportional to the total differences: di/Dx = di/D* That is, 

Median — 69.5 50 — 36 

79.5 - 69.5 = 68 - 36 
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whence, 

Median = 69.5 + 10 Q0 

= 69.5 + 4.4 = 73.9. 

This is called “ straight line interpolation ” or “ interpolation by- 
proportional parts.” The reason for these names is made clear in the 
following diagram. 



A ABC is similar to AAED 
AB BC 
" AE _ ED 


x = AB = 


AE-BC 

ED 


10(50 - 36) 
68 - 36 




/. Md = 69.5 + x = 73.9. 

The following formula may also be used to compute the median: 


Hfd — eXm I 
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where e x m is the lower end-value of the median class, N is the total 
frequency, &/ w the number of variates below the median class, c the 
class interval, and f c the frequency of the median class. 

Case II. In the case of a set of discrete variates there may be no 
value in the set such that the number of variates which are larger than 
it is equal to the number less than it. Thus in Table 11 the values of x 
are integers and 35% of the throws yielded 5 or fewer successes and 
65% yielded 6 or more successes. Neither x = 5 nor x = 6, nor any 
integer, will exactly split the total frequency into two equal parts. 
Of course a formal application of the definition given in Case I will 
give a value of x for which cum f is N/ 2. The difficulty is not so 
much in the interpretation of the fractional result because the same 
objection could be cited against the mean. But the real difficulty 
lies in explaining interpolation in a discontinuous function. We 
cannot assume that the given frequencies are distributed over the 
interval from one value of x to the next. Perhaps the best we can do 
in such cases is to make a statement similar to the one above for 
Table 11. At least such a statement serves to summarize the situa- 
tion without artificiality. 

14. Graphical Interpretation of Mean, Median, and Mode. The 
mean corresponds to the abscissa of the point known in mechanics as 
the centroid of area. If a thin, homogeneous plate of metal cut in 



Mean 


Fig. 9 

the shape of a histogram is supported loosely on a horizontal axis 
through its centroid, the plate will have no tendency to rotate, what- 
ever horizontal direction this axis may assume. 

The median of a frequency distribution is the abscissa of a point 
through which a vertical line will divide the total area of the histo- 
gram into two equal parts. 

If a distribution could be represented by a smooth curve, then the 
mode is the abscissa of the highest point on the curve. 

Figure 9 shows the position of the three averages in a moderately 
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skew distribution. If the distribution were perfectly symmetrical 
then all three of these measures of location would coincide. 

There is an interesting empirical relationship between the three quantities 
which appears to hold for unimodal curves of moderate asymmetry, namely, 

mean — mode = 3 (mean — median). 

It is a useful mnemonic to observe that the mean, median, and mode occur 
in the same order (or reverse order) as in the dictionary; and that the median is 
nearer to the mean than to the mode, just as the corresponding words are nearer 
together in the dictionary. 1 

15. Discussion. The student primarily interested in the use of 
these averages in practical statistics might reasonably inquire, 
“ Which of the three averages mentioned should be used in a given 
problem? ” The answer depends upon certain properties peculiar to 
each average and upon the nature of the data to be averaged. 

In most cases the mean is a distinctly superior average. It is 
rigorously defined, easily computed, and is most tractable in theoreti- 
cal discussions. 

When the median differs considerably from the mean it is likely 
that the median is the more typical value. The advantage of the 
median over the mean is recognized in at least three situations: 

(a) When occasional and unexpected values occur at the ends of 
the distribution. In such cases the mean would tend to distort the 
true representation of the typical value, being unduly influenced by 
the exceptional values. 

( b ) When the data are presented in a table left open at one or both 
ends. For example, suppose the registrar’s office of a university 
reports the^following distribution of grades as given in all departments 
for a semester: 


Below 60 

60-69 

70-79 

80-89 

90-100 

215 

1060 

2217 

1242 

506 


A cum / table may be formed and hence the median can be found 
without any more information about the values less than 60. 

(c) When the observations cannot be measured numerically but 
can be ordered. 

The mode is best adapted to situations where the word “ usual ” 
would be appropriate. Unless a large number of items are con- 

1 M. G. Kendall — The Advanced Theory of Statistics , vol. I, p. 35. Lippineott. 
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sidered the mode can have little practical meaning. It is the appro- 
priate average in certain questions of marketing because manufac- 
turers are interested in the type or quality which is usually in demand. 
Or again, in an investigation concerning wages and cost of living, the 
mode would reflect the average situation. Also, in a mathematical 
treatment of frequency curves the concept of the mode is very useful. 

Sometimes a distribution has more than one mode, although this is 
usually due to heterogeneous material. In this course we will be 
concerned only with unimodal distributions. 

The above remarks about the appropriateness of various averages 
are made from the standpoint of describing and condensing the data 
per se . A few remarks from a different point of view should perhaps 
be added here. In the theory of sampling, which deals to a large 
extent with estimating from a sample certain constants in the parent 
universe, it is shown that the mean has definite advantages. The 
mean is much more efficient 1 than the median, for example, in esti- 
mating the corresponding average in the universe (except in a special 
case when the universe is an unusual type). 

For a more complete treatment of the applicability of these three 
averages, the student is referred to the following books: 

1. Theory of Statistics — Yule and Kendall, Ch. VII. 

2. The Mathematics of Statistics — Burgess, Ch. V. 

3. Mathematical Statistics — Camp, p. 40. 

Exercises 

1. State what the empirical mode is in each of Tables 8 to 13. 

2. Explain why the median is found from interpolating in the end-x column 

and not the mid-x column. 

3. Read one or more of the references in §15 and write an essay on the ad- 

vantages and limitations of the mean, median, and mode. 

4. Find the median IQ for the data in Table 7. 

6. Find the median for the data in Table 9. 

16. Geometric Mean. The geometric mean of a set of N 
positive values is the Nth root of their product. Thus, the geometric 
mean (G.M.) of two values is the square root of their product, of three 
values the cube root of their product, and in general for the N values 
Vh * * * j Vn } 

i 

(10) G.M. = [ yi .y 2 .y s ... y N ]". 

1 See Economic Control of Manufactured Products — W. A. Shewhart, p.,280. 
D. Van Nostrand Co. 
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Equation (10) lends itself to the use of logarithms and frequently they 
greatly facilitate the computation of G.M. From (10) we have 

(11) log G.M. = i [log 2/i + log y?. H f log y N \ 

Therefore the arithmetic mean of the logarithms of a set of values 
is the same as the logarithm of the geometric mean of the values 
themselves. 

Examples: Find the geometric mean of 
(o) 3, 6, 12, 24, 48. 

Solution: 

G.M. = [(3 5 )(2 w )] l/s = (3) (2“) = 12. 

(6) 7.96, 13.82, 22.95, 35.34. 

Solution: 

log 7.96 = 0.90091 
log 13.82 = 1.14051 
log 22.95 = 1.36078 
log 35.34 = 1.54827 
4[4.95047 
log G.M. = 1.23762 
G.M. = 17.28 

The geometric mean is the appropriate average when the data are 
limited at one end of the range and unlimited at the other, and there 
tends to be a constant rate of change from one y value to the next. 
This is characteristic of values which tend to form a geometric pro- 
gression, i.e., which tend to follow the simple exponential law 

(12) y = ar x . 

The student will recall from algebra that a geometric progression can 
be put in the form 


X 

0 1 2 ••• X 

y 

a ar ar 2 • • • ar x 


The value of any term in the y series is a function of the exponent of r 
since a and r are constants. The functional relationship is therefore 
represented by ( 1 2) . 

The growth of many quantities in nature follows this law and it is 
sometimes called the law of natural growth. With x referring to 
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time, y may represent, for example, the population of a city, the 
enrollment of a school, the weight of a quantity, or the number of 
bacteria in a culture. The accumulated amount S, of P dollars 
invested at i rate of interest, compounded periodically for n periods 
also takes the form of (12), namely, 


S = P(l + i)y 

where r is now (1 + i), a is P, and n and S are the variables 
corresponding to x and y. 

Thus, if $1000 increased at compound interest to $2150 in 31 years, 
$1000 $2150 

| 4 [ f 1 

0 1 2 30 31 


the geometric average rate at which the money, increased is found 
as follows 


r 31 = (1 + i) si 


2150 

1000 


1 + i = (2. 15) 1/31 
= 1.025 


i = 2|%. • 


Since there was an increase of = 115%, the arithmetic average 
115 

would be — = 3.7% which is also the simple interest rate. 

OX 

If y in equation (12) represents population, and we are given two 
values of y corresponding to two dates N years apart, the geometric 
mean enables us to find a fairer estimate of the value of y at the mid 
date than would be given by any other average. For example, 
suppose we are given that the population of a city was 2500 in 1920 
and 5000 in 1930. We wish to estimate the population in 1925 and 
to find the average annual rate of increase. If we are given no other 
information, our best estimate for 1925 is given by 

G.M. = (: yi • y 2 ) 112 - (2500 X 5000) 1/2 - 3535. 

The average annual rate of increase is obtained by solving (12) for r as 
follows: 

5000 = 2500r 10 
2 = r 10 . 
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Hence r = a/ 2 = 1.0718 = 107.18%, so that the average annual 
rate of increase is 7.18%. It is now possible to estimate the popula- 
tion for any intermediate year. Thus, for 1928, we have from (12): 

y = 2500(1.0718) 8 = 4353. 

The geometric mean is also used in economics in averaging- “ index 
numbers ” which are essentially the ratios of prices of commodities at 
one date to their prices at another date. In general it is the appropri- 
ate average when emphasis is on the rate or percentage of change 
rather than the amount. 

17. Harmonic Mean. Another average which has long been 
known and which is required in certain problems is the harmonic mean 
(H.M.). For the N positive values x h x 2} • • *, x N , it is defined as the 
reciprocal of the arithmetic mean of the reciprocals of the values. 
In symbols, 

03, H.M. - — -pi 

1 {”•*• + 

N \Xi x 2 

This measure is used in averaging ratios, such as rates and prices, when 
certain conditions are agreed upon. 

In the case of time rates, we have ratios between two quantities 
one of which is in units of time, which we will denote by t, and the 
other is in units of some element like distance or accomplishment or 
temperature, etc. Denote this second element, different from time, 
by d. Then we make the following observations: 
c (a) A rate may be stated either in the form d/t or in the form t/d. 
Thus, a car which travels at the rate of 30 miles per hour may also be 
said to travel at the rate of 2 minutes per mile. In this illustration 
the second form is not the usual way of expressing the rate, but there 
are cases in which the form t/d is usual. When we say a man takes 
10 seconds to run 100 yards we are expressing his rate in time per 
unit of distance (t/d). 

(b) In averaging rates one should first decide whether d or t should 
properly be the basic or “ fixed ” element in the discussion. Occa- 
sionally there is a difference of opinion about which element should 
most appropriately be regarded as fixed. For example, suppose a 
class of students has been given 15 minutes in which to work as many 
as they can of a given list of problems, and the . number of problems 
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worked correctly by each student recorded. Some educational 
statisticians would say that time should be the fixed element here and 
that n um ber of problems solved (in a unit of time) should be the vari- 
able. Others would say that the number of minutes (t) a student 
required to work one problem is the proper variable and that 
a problem (d) should be regarded as the fixed element in the dis- 
cussion. 

In one case the rates are equally weighted in the sense of time 
and in the other case they are equally weighted in the sense of the 
element d. 

(c) The harmonic mean of the rates expressed in the form d/t gives 
the same result as the arithmetic mean of the same rates expressed in 
the form t/d. This is evident from equation (13) if it is written in the 
form, 

H.M. N^Xi 

and from the fact that rates in one form are merely the reciprocals of 
the same rates in the other form. 

As an illustration, let us consider three cars: 

A travels at the rate of 15 miles per hour (J mile per minute), 

B travels at the rate of 20 miles per hour (| mile per minute), 1 
C travels at the rate of 30 miles per hour (| mile per minute). 

But their rates could just as well have been stated as 

A travels at the rate of 4 minutes to the mile, 

II • B travels at the rate of 3 minutes to the mile, 

C travels at the rate of 2 minutes to the mile. 

'I The harmonic mean of the rates as stated in I is 20 miles per hour; 

I i.e ., | of a mile per minute, and the arithmetic mean of the rates as 

j stated in II is 3 minutes per mile or again, 20 miles per hour, 

j (Verify.) 

| The third observation, i.e ., (c) above, suggests the following discus- 

| sion. The arithmetic mean of the rates in I is 21 § m.p.h. and this is 

j the harmonic mean of the rates as stated in II. 

| The question arises, which is the correct average, 20 m.p.h. or 21 f 

j m.p.h.? The problem is indeterminate until it is agreed whether 

j time or distance is the fixed element. The correct average will differ 
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according to the condition agreed upon. This will be made clear in 
the following analysis. 

Case I. Let 

di 


denote the zth rate, i = 1, 2, • • •, n. Then the average rate is 
D = total distance _ t&i + t 2 x 2 + • • • + t n x n 
T = total time ti *4“ t 2 t n 

Condition 1. Let distance be the fixed element, i.e. y let d be con- 
stant. Then d = Uxi, and U — d/x{. Therefore, the expression for 
average rate becomes 

^UXj __ nd __ 1 

E- <*E- -E- 

Xi Xi n Xi 

which is the harmonic mean. 

Condition 2. Suppose t is the fixed element. Then 
becomes t^Xi since t is a constant, and becomes nt. Hence, we 
have for the average rate, 

D t^Xi _ lyn 

T nt n 

which is the arithmetic mean. 

Case II. Let Xi = U/di denote the £th rate. Then the average 
rate is 

T = total time __ YlU 
D = total distance 

Condition 1. Suppose d is the fixed element. Then U = dxi and 
d = ti/xi. Hence, we have 

T = djjd _ ^ 

D nd n 

Condition 2. Let t be fixed. Then di — t/xi and the average 
rate is 

T _ nt _ 1 

D tZ- -E- 

Xi n Xi 



58 


Averages 


III 


We therefore state the following rules for averaging rates: 

Rule 1. The harmonic mean is used whenever the fixed element is d 
and the rates are expressed in the form d/t, or when the fixed element 
is t and the rates are expressed in the form t/d. 

Rule 2. The arithmetic mean should be used when the fixed ele- 
ment is t and the rates are expressed in the form d/t, or when the fixed 
element is d and the rates are expressed in the form t/d. 

In the case of prices, which are of course ratios, a similar discussion 
holds except that now the unit of time is to be replaced by a unit of 
money. Therefore, prices are ratios between two quantities, one of 
which is in units of money and the other in units of some commodity 
or service. They may be stated as so much money per unit of com- 
modity ( m/c ), or as so many units of commodity per dollar (c/m). 
Thus, if 100 bushels of wheat are exchanged for 75 dollars of gold, the 
price of the wheat in terms of gold is 75 -f- 100, or three-fourths of a 
dollar of gold per bushel of wheat. Contrariwise, the price of gold 
in terms of wheat is 100 -r- 75, or one and one-third bushels of wheat 
per dollar of gold. Thus, there are always two prices in any ex- 
change. 

The correct average will depend upon how the prices are stated and 
upon whether a unit of the commodity (or service) or a unit of money 
is the fixed element. 

The following papers in The Journal of the American Statistical 
Association are recommended: 

1. “ The Nature and Use of the Harmonic Mean ” — W. F. Ferger, 
vol. 26 (1931), pp. 36-40. 

2. “ Calculating the Geometric Mean from a Large Amount of 
Data” — Zenon Szatrowski, vol. 41 (1946), pp. 218-220. 

Examples 

1. In a certain factory a unit of work is completed by A in four minutes, by B 
in five minutes, by C in six minutes, by D in ten minutes, and by E in 
twelve minutes. What is their average rate of working? At this rate 
how many units will they complete in a six-hour day? 

Solution. The rates are expressed in the form t/d but it would seem appro- 
priate to regard t as the basic or fixed element since output per unit of 
time appears to be the important consideration here. So by Rule 1, 

H.M. = - , 

i + l + i + ijj -f is 
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that is, 


H.M. 


300 ^ . 

~rr = 6| minutes per unit. 


In 360 minutes they will complete — = 288 units. 

25 

2. A tourist purchases gasoline at three stations, as follows: 


Station 
. A 
B 
C 


Number of gallons of 
gasoline for SI. 00 

5 
7 

6 


Here the prices are given in the form c/m and it would seem appropriate to regard 
gallon (c) as the fixed element and prices (m) per gallon as the variable quantities 
which are to be averaged. Hence, replacing d/t by c/m and “ rates ” by “ prices ” 
in Rule 1, we are led to find the harmonic mean. 


H.M. - 


I + i + * 
630 
107 
S107 


gals, per SI. 00 


630 


per gal 


Exercises 

1 . (a) The arithmetic mean of a set of 30 numbers is 82. What is the sum of 

these numbers? 

(b) The G.M. of ten numbers is 1.40. What is the product of these ten 
numbers? 

2. In chemistry a student was graded 65 in final examination, 85 in recitation 

and 80 in laboratory. These grades were weighted 1, 2, and 3 respectively. 
Find the student’s average grade. 

3. At the end of his first semester in college a freshman had credits as follows: 

4 hours of mathematics with a grade of 88, 4 hours of English with a grade 
of 80, 3 hours of history with a grade of 85, and 4 hours of physics with a 
grade of 78. What was his average grade per hour of credit? 

4. Find the median of Table 12. 

5. The population of a city increased in 5 years from 225,000 to 245,000. What 

was the average increase per year? What was the average annual rate of 
increase? 

6. The number of bacteria in a certain culture was found to be 4 X 10 6 at noon of 

one day. At noon the next day the number was found to be 9 X 10 6 . 
If the number increased at a constant rate per hour, how many bacteria 
were there at midnight? 
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7. Find the average (G.M.) rate of interest for five years during which the in- 

terest rates were 4.25%, 5.3%, 4.65%, 3.86%, 4.38%. 

Hint. (1 + i) 5 = (1.0425) (1.053) (1.0465) (1.0386) (1.0438). 

8. Find the harmonic mean of the first fifteen positive integers. 

9. For two positive numbers, a and 6, the geometric mean is x = Vo6. This is 

also called the mean proportional between a and 5, since a : x — x:b. 
By drawing a semicircle on a + b as diameter, show how the value of x can 
be constructed geometrically. 

10. The following table gives the population of the U. S. at each 10-year census 
from 1860 to 1920. 


Year 

x 

Population 

(millions) 

Ratio of Each Census 
Figure to Preceding 

1860 

31.4 


70 

38.6 

1.23 

80 

50.2 

1.30 

90 

63.0 

1.25 

1900 

76.0 

1.20 

10 

92.0 

1.21 

20 

105.7 

1.15 


What is the average rate of increase per decade? Using this average, 
estimate the population for 1930 from the 1920 census figure. 

11 . If a series of positive variates form a geometric progression show that their 

logarithms form an arithmetic progression. 

12 . Find the geometric mean of the following: 

(a) 2, 4, 8, 16, 32. 

(b) 47, 92, 123, 218. 

13 . Given two sets of n positive variates each: 

•Till 2/12) 2/J3 ) * * * j 2/jn 
2/21) 2^22) 2?23j * * * ) 2?2 »• 

Prove that the geometric mean of the ratios of corresponding variates in 
the two sets is equal to the ratio of their geometric means. 

14 . (a) For a frequency distribution of positive variates show that (10) becomes 

G.M. = • X2 f * • • * 

where k is the number of different values of x in the set, any exponent fi is 

k 

the number of times x% is repeated, and N = 

l 

(b) What is the expression for log G.M. when G.M. is defined as in (a)? 

16 . A wholesale firm has twelve travelling salesmen who make trips of essentially 
the same length. Of these, eight make their trip in 20 days and four in 15 
days. What is the average time per trip? Ans. 18 days. 
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16. State two rules for averaging prices similar to those given for averaging rates. 

Give illustrations. 

17. Consider any two positive variates xi and x 2 . Prove that their geometric 

mean is equal to the geometric mean between their arit hm etic mean and 
their harmonic mean. 

18. ( Burgess ) The following problem arose in a statistical office in Washington 

during World War I: Suppose 20 boats make 6 trans-atlantic trips 
each per year, giving as the time for a “ turn around ” (i.e., time between 
consecutive departures from the same ports), one-sixth year — approxi- 
mately 60 days, and that 10 boats make 4 trips per year, giving as their 
time for a “ turn around ” one-fourth year, approximately 90 days. (A 
year of 360 days is used merely for convenience.) What is the average 
number of days per turn around? 

Hint If we think of the rates expressed as “ trips per year ” then 
x = d/t. If t is regarded as the fixed element, then by Rule 2 the arith- 
metic mean is indicated, and x — 6 for 20 values of x, and £ — 4 for 10 
values. 

If we think of the rates expressed as “ days per trip ” then x — t/d. If 
i is the fixed element, by Rule 1 the harmonic mean is the correct average, 
and x — 60 for 20 values and x = 90 for 10 values. Ans. 5-J trips per 
year or 67.5 days per trip. 

19. Show that if 2 a is the harmonic mean of the two rational numbers b and c, 

then the sum of the squares of the three numbers a, b , and c is the square 
of a rational number. 

(Reference: American Mathematical Monthly , June 1935, p. 394.) 

20. (a) If A, G, and H represent, respectively, the arithmetic, geometric, and 

harmonic means of N unequal positive variates, prove that 

H < G < A 

(Reference: Burgess’ text, p. 101.) 

(6) What can you say if the N positive variates are equal? 

21. A plane travels one half of a given distance D in miles at a speed of xi miles 

per hour, and the remaining half distance at a speed of x 2 miles per hour. 
Show that the average speed for the entire distance is the harmonic mean 
of xi and x 2 . Half of this average speed is called the “ radius of action 
per hour i.e., it is the outbound distance that a plane can travel and 
return in one hour. The “radius of action” of a plane would be the 
“ radius of action per hour ” multiplied by the number of hours in flight. 



CHAPTER IV 
MOMENTS 

1. Moments about an Arbitrary Origin. One of the general prob- 
lems of statistics is to summarize and characterize data. In the 
words of R. A. Fisher, 

A quantity of data which by its mere bulk may be incapable of entering the 
mind is to be replaced by relatively few quantities which shall adequately rep- 
resent the whole, or which, in other words, shall contain as much as possible, 
ideally the whole, of the relevant information contained in the original data. 1 

These “ relatively few quantities ” are usually expressed in terms 
of moments. Moments are of different orders and the student is 
already familiar with what is now to be known as the first moment, 
namely, the arithmetic mean of the first powers of the variates. We 
will also need in our work the arithmetic means, respectively, of the 
second, third, and fourth powers of the variates. With reference to 
an arbitrary origin, moments are denoted by v (the Greek letter nu) 
with a subscript specifying the order. 

The first four moments, relative to the x-origin and in the x unit, 
are defined as follows: 

v s=jl/* s 

i varying from 1 to k. 

A more general definition of the j^s is 

(la) Vr - - x 0 y 

jy i 

1 Foundations of Theoretical Statistics, Philosophical Transactions of the Royal 
Society, vol. 222 A (1922), p. 309. 
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for the rth moment about an arbitrary point x 0 . When x 0 = 0 and 
r = 1, 2, 3, and 4, we have the definitions stated in (1). If r — 0 
we have the zero th moment and v 0 — 1. 

In statistics we work with moments per unit frequency. The 
term “ moment ” has its origin in mechanics where we speak of the 
“ moment of a force.” Suppose we have a rigid bar, called a 
lever, with one point of support known as a fulcrum (Figure 10). If 
a force fi is applied to the lever 
at a distance Xi from the fulcrum 

0, the product xifi is called the Xj h 

moment of the force. If there c w ; - z=r±z ± T 

are two or more such forces f h ^ ' 

jf 2 , * * *, fh acting in the same Fig 1Q 

direction, and at the distances 

Xi, x%, • * *, Xk, respectively from 0, the total moment of all these 
forces is 

flXi + fix 2 + • • + fkXk = y^fiXi. 

If the distances x are squared, we have Xjfe 2 as total second 
moment, and X/* x * r represents the rth moment. 

It is by analogy with this mechanical concept that the expressions 
in (1) are called statistical moments (per unit frequency) about zero 
as origin. 


Exercises 

1. Write out the expanded form of the v’s defined in (1). 

2. Calculate the values of vi, V 2 , and v 3 for the following distributions: 


(a) ( 6 ) 


X 

/ 

X 

/ 

0 

1 

—3 

1 

1 

3 

—2 

3 

2 

5 

-1 

5 

3 

10 


5 

4 

5 

1 

3 

5 

2 

2 

1 


3. (a) Prove that vq is always equal to unity. 

(6) Prove that moments of even order are always positive or zero, but that 
moments of odd order may be positive, negative, or zero. 
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(c) Show that the odd moments are all zero if both the x*s and fs are sym- 
metrical with respect to the origin of x , as, for example, 


X 

~1.5 

-1.0 

-0.5 

0.5 

1.0 

1.5 

f 

1 

2 

3 

3 

2 

1 


2. Moments in Units of the Class Interval. In Chapter III, 
§8, the mean in the x unit was obtained by first finding the mean in 
1 

the u unit, viz., — and then changing over into the x unit by 


multiplying by the interval c. In our subsequent work, which re- 
quires the higher moments, we shall find it convenient to use a similar 
procedure, and find those moments in the u unit, where u = 
(x — x 0 )/c . It is desirable, therefore, in labeling the moments for 
any distribution, to specify whether they are in the unit of x or u . 
This is commonly done by the use of a second subscript on v. Thus 
v r:u denotes the rth moment in the u unit and relative to the ^-origin. 
Therefore, 


( 2 ) 


23/iUj 8 

ViM ~ N 


J, £ f 6ft 


Similarly, v r , x will mean t When there is no ambiguity, the 

second subscript on v may be omitted. 

3. Moments about the Mean. Formulas (1) and (2) define the 
moments taken about zero as origin although in different units. 
When the mean is chosen as origin we have the most important set 
of moments in the theory of statistics. In this case the Greek letter 
p. (mu) is used to denote the moments, and it is always understood 
that the use of p specifies the mean as origin. It does not, however, 
designate the unit, so the second subscript may still be necessary. 
Therefore, the rth moment about the mean is defined by either of the 



Sec. 4 Relations between the p. 5 s and v’s 65 

following expressions: 

^ r:x “ x ^ r 

W 

V*r.u ~~ U) r . 

The mean is a sort of balance point. If weights proportional to 
the frequencies are suspended along a horizontal bar at distances from 
one end proportional to the numbers representing the class marks, 
then the bar will balance at the weighted mean of the distances. In 
mechanics this point is known as the abscissa of the center of gravity 
or centroid . Theorem VI of Chapter III, §7, is another way of say- 
ing that the given distribution is in equilibrium about this point. 

4. Relations between the pi’s and v’s. We shall see that the de- 
scriptive constants mentioned at the beginning of the chapter are 
defined in terms of the moments about the mean, but the moments 
about an arbitrary point are easier to calculate. In other words, 
what we desire are the values of /z r , but their computation directly 
from the definitions (3) may be very laborious even in the u unit due 
to the fact that ( u — u) usually involves decimals. Raising these 
decimals to the second, third, and fourth powers becomes tedious 
even with the aid of a computing machine. On the other hand, the 
i/’s defined in (2) are readily computed. Therefore, instead of com- 
puting the n’s directly we obtain them indirectly from the v’s, The 
relations between the n’s and z> J s can be found by expanding, by the 
Binomial Theorem, either of the expressions following the 

(3) for r = 2, 3, 4. This is done in the u unit as follows: 

1*2 ]£/<(«< ~ “) 2 
= E/iW*- 2 - 25 • ^ Em + 2 s 

= v 2 — 2uvi + u 

(4) = v 2 — (Vj)*, since 5 = v \ 

H* = - 5) 3 

= v 3 - 3 v 2 • V! + 2 (Vi) 3 
|i 4 = v 4 - 4v 3 ■ Vi + 6 v 2 (vj) 2 - 3(^)4. 



(5) 

(6) 
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These formulas are important and the student should be able to 
derive them. It should be apparent that these moment relations 
hold also in the x unit. However, if we have the ixs in the u unit 
and we desire them in the x unit they may be found as follows: 


( 7 ) 


1*2:* = C 2 }*2:m 
1*3:* = C^f*3:u 
1*4:* == C 4 j*4 :u- 


The first of the relations given in (7) is proved below. The others 
may be proved in a similar manner. 

1 

^ 2 :* = ^ 2-) - x) 2 by definition, 

= ~ ]>2fi(xo + cui — Xq — cuY by (4a) and (5), Chapter III, 


— jy ^) 2 — 0 "/ Z2 :m« 


We see that the indirect method of computing the n’s (in the u unit) 
involves two steps. First the p’s are computed according to the 
definitions in (2). This step is illustrated in Table 18. Then we 
calculate the v’s by substituting the computed p’s in relations (4), 
(5), and (6). The /*’s in the x unit could then be obtained, if desired, 
by meansV (7). 

Before proceeding with the second step it is desirable to check the 
p’s or, at least, the totals of the columns from which they are ob- 
tained. This can be done if we have another column headed 
f(u + l) 4 , and observe that 

E/(u + l) 4 = I > 4 + + 6 2> 2 + 4 + £/• 

This is known as Charlier’s check. An alternative one is to check the 
entries in the column fu A against the proper entries in Pearson’s, 
Tables for Statisticians and Biometricians , Table L. 

Charlier’s check is a necessary but not a sufficient check. That is 
to say, compensating errors may occur which this check would not 
detect. However, the occurrence of such errors is very unlikely. 

Applying Charlier’s check to Table 18 we have 

1220 = 1088 + 4(— 236) + 6(176) + 4(-20) + 100 - 1220. 
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Table 18 — Moments foe Distribution of Grades 


Bata 

Computations 

X 

f 

u 

fu 

fu * 

fu 3 

fu 4 

f(u + ly 

34.5 

2 

—4 

- 8 

32 

-128 

512 

162 

44.5 

3 

—3 

- 9 

27 

- 81 

243 

48 

54.5 

11 

-2 

—22 

44 

- 88 

176 

11 

64.5 

20 

-1 

-20 

20 

- 20 

20 

0 

74.5 

32 

0 

0 

0 

0 

0 

32 

84.5 

25 

1 

25 

25 

25 

25 

400 

94.5 

; 7 

2 

14 

28 

56 

112 

567 

Sums 

100 


-20 

176 

-236 

1088 

1220 

Sums 

1 


-.20 

1.76 

! -2.36 

10.88 

For Charter's 

N 



vi: u 

VV.U 

vr.u 


check 


Hence we may proceed with confidence to compute the n’s. Using 
relations (4), (5), and (6): 

= 1.76 - (—. 20) 2 = 1.72 

p. 3:u = - 2.36 - 3 ( 1 . 76 )( — . 20 ) + 2 (-. 20) 3 - - 1.320 

M4:u = 10.88 - 4 (- 2 . 36 ) (-. 20 ) + 6 ( 1 . 76 )(-. 20) 2 - 3 (-. 20) 4 
= 9 . 4096 . 

The following check, which can be handled readily on a machine, 
may be used to check the v’s: 

Vi = ^ = ^ £/<[(*< ~ "l) + "l]'* 

= llA + 4^1 + 6m 2^1 2 + J'l 4 . 

Before explaining the applications of ix 2 , ju 3 , and im we present some 
exercises which will aid the student in mastering the procedure thus 
far developed. 

Exercises 

1. (a) Verify relations (4), (5), and (6). 

(6) Show that these relations hold also in the x unit. 

(c) Prove that jui = 0 in any unit. 

(d) When / = 1, show that 

l N 

H1-.X = - T I >i 2 - 

N i 
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2. Verify the relations given in (7). 

3. Using Table 18 as a model find the v’s for Iowa City rainfall by extending 

Table 17. 

4. Find the ju's from your results in Exercise 3 above. 

5. Standard Deviation. Formula (4), ju 2 = V 2 — vi 2 , is perhaps 
the most important of the moment relations for elementary statistics. 
It states that the second moment about the mean is equal to the 
second moment about zero diminished by the square of the mean 
measured from zero. 

Many of the definitions in statistics are essentially those of physics 
and mechanics. The analogy between the mean and centroid has 
been mentioned. The above statement about formula (4) is a well- 
known proposition in mechanics when the word centroid is substi- 
tuted for mean. 

In mechanics the equivalent of Ny 2 is called the moment of inertia 
(about the axis through the centroid) and (ix 2 ) l/2 is the radius of gyra- 
tion. These notions are carried over in statistics. Suppose a thin 
metal plate in the shape of a histogram is rotating about a vertical 
axis through its centroid. There is a distance from the centroid at 
which the entire mass of the histogram could be concentrated 
without changing its moment of inertia. This distance is the 
square root of 1 x 2 . It is an average rotational radius for all par- 
ticles of the rotating mass. In statistics, (^ 2 ) l/2 is called the stand- 
ard deviation and is denoted by the small Greek letter cr. Therefore 
we have 



[cr x = ccr w . 


We shall see later that or is a measure of what is called dispersion . 
More precisely, it measures the extent to which the data are spread 
out “ on the average ” on either side of the mean. (See Figure 11.) 
The student will obtain a more complete understanding of <r as the 
course develops. 

The mean and standard deviation are always expressed finally in 
the same units as the variates. If x represents inches, we desire the 
mean and standard deviation in inches. When obtained they should 
be labelled appropriately. 
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Fig. 11 


Example. For Table 18, we have 

X = cu + xo = 10 ( — .20) + 74.5 = 72.5% 

<ru = = (1.72) 1/2 = 1.31 

v* = c <7„ = 10(1.31) = 13.1% 

Thus, we have explained the use of the first and second moments. 

The student will observe that the change from a u to <j x does not in- 
volve * 0 - The standard deviation is affected by the change in units 
but is independent of the origin of reference. To prove this let 
x' = x — Xo, whence x' = x — x 0 (why?). Then 

OV 2 = fi( x < ~ Z'Y 

= JZf&i ~ X 0 - X + Xo ] 2 

= jj Z/ife - 2 ) 2 

= V2:x = CTx 2 - 

This suggests the more general 

Theorem. The value of y r remains invariant under a transforma- 
tion which changes only the origin of reference of the variates. 

The student is asked to prove the equivalent of this theorem in 
Exercise 3 after §9. 

6. Standard Units. The above section explains ju 2 . There re- 
mains the explanation of \x 3 and ju 4 . We mil lead up to this by 
defining standard units. We have mentioned the transformation 
x' = x — x. Another very useful transformation consists in measur- 
ing such deviations from the mean in units of the standard deviation, 
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a xy of the entire distribution. They are then known as standard 
units and will be designated by t Thus, 


(9) 



Graphically, this translates the origin to the mean and measures dis- 
tances along the horizontal axis in terms of a x . It is a special case 
of the more general transformation 


u — 



The significant characteristic of the t variate is its independence of 
the unit in which the original measurements were taken. For ex- 
ample, suppose we were concerned with obtaining the linear measure- 
ments of a set of individuals. One distribution of variates would 
result if the measurements were made in feet. In this case x', x, and 
<t x would also be in feet. If the measurements were taken in inches, 
then x', x , and a x would be in inches, and each of these values would 
be, numerically, twelve times as large as the corresponding numbers 
in the first distribution. However, the variates expressed in standard 
units would be the same for the two distributions. Thus if 


and 


x = 50 ft. = 50(12) in., 
<r x = 5 ft. = 5(12) in., 


then for an individual measurement of x — 60 ft. = 60(12) in., we 
have 

_ 10 ft. __ 10(12) in. 

5 ft. 5(12) in. 

* = 2 = 2 . 


It is obvious, therefore, that standard units provide a basis for 
comparing distributions. Moreover,' they make possible important 
simplifications in certain mathematical operations. 

With the aid of a computing machine, a distribution may be easily 
transformed into standard units by means of the so-called continuous 
process. To illustrate, suppose for the distribution of Table 9 (§11, 
Chapter I), it has been found that 

x = 47.712 lbs. 

(7 X = 5.772 lbs. 
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By relation (9), then, 

r _ 47 71 9 

* = ~ = .17325a: - 8.2661. 

Referring to the discussion of the continuous method given in the 
Introduction, we observe that here k = —8.2661, n = .17325, and we 
desire the values of t corresponding to the values of x given in Table 9. 
For the values of x such that nx < k 9 we write the above relation in 
the form 

-t = 8.2661 - .17325#. 

The procedure 1 now is to register 8.266100 on the product register, 
punch the constant factor .17325 on the keyboard, and then by turn- 
ing the crank backward so that the successive values of x appear on 
the revolution register, we subtract from k the products of this mul- 
tiplier and the values of x. The various values of x are built over 
from one to another without clearing the dial. The resulting values 
of —t are read at each stage from the product register until we get 
— t = 0.383. From here, nx > k, so we clear the dials and start 
over using the original form of the relation between x and k We now 
register —8.266100 on the product register by turning the crank 
backward, punch .17325 on the keyboard, and turn the crank for- 
ward to form the values of x on the revolution register. The values 
of t are read as before from the product register at each stage of the 
build-over process. In this way the following set of standard vari- 
ates is obtained: 

Table 19 


X 

/ 

t 

29.5 

1 

- 3.155 

33.5 

14 

- 2.462 

37.5 

56 

- 1.770 

41.5 

172 

- 1.076 

45.5 

245 

- 0.383 

49.5 

263 

0.310 

53.5 

156 

1.003 

57.5 

67 

1.696 

61.5 

23 

2.389 

65.5 

3 

3.082 


1 If automatic machines are available the instructor will explain the pro- 
cedure. 
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We see from Table 19 that a range of t = ±3 takes in practically all 
the variates. This is typical of the more common distributions. 

If x = 0, then t = x/a and the origin of t is the same as the origin of 
x. Some writers use X to denote the variates (i.e., pounds, dollars, 
temperatures, etc.), and use x to denote deviations from the mean. 
In that notation, t — x/<r would have the same meaning as our equa- 
tion (9). Occasionally in later chapters we shall find it convenient 
to designate deviations from the mean by x (instead of rr')- If so, it 
will be stated that the origin of x is at the mean or centroid. 

7. Moments in Standard Units. The moments in standard units 
are denoted by the Greek letter alpha, a . Thus for the rth moment 


in standard units, we have a r = — T ]fiU r . 

N 


However, it is not neces- 


sary to transform the variates into t units in order to compute the as. 
We shall show that they are functions of the m’s. Thus 


«r = YjfiU by definition 

= fr ° m(9) 
= Why? 

Hence 


( 10 ) 


V-r: 

(o’*) 


Why? 


|lr : 3 


Letting r 


( H - 2:*) r/2 

1, 2, 3, 4 in (10) we have 


(10a) 


<*i = 


a 2 = 


a 3 = 


04 = 


[Tlx 

|l2:* 

CTx 2 

(<r*> 

\U:x 

(c r x y 


= 0 


i3 


from (8). 


It is obvious that ai and are abstract numbers. This is also the 
case for the other a’s. In the expressions for a 3 and a 4 both numera- 
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tor and denominator are of the same dimension. That is to say, in 
«3 = M 3 /o' 3 both numerator and denominator are the cubes of what- 
ever unit is used in the original measurements, and therefore their 
ratio is of zero dimension, a pure number. Similarly, in a 4 = M 4 A 4 
both numerator and denominator are the four powers of the same 
unit, and therefore a 4 is an abstract number. 

Some writers use g 4 instead of a 3 and g 2 for a 4 — 3. 

8, Use of a 3 and cu. Since a± and a 2 have the same values for all 
frequency distributions, their computation contributes nothing to 
the description or characterization of a distribution. But the values 
of <23 and <24 depend upon the shape of the histogram representing a 
distribution, and are therefore useful in distinguishing between types 
of distributions. Thus, we observe that 

M3 = Jj 2/(3 - *) 3 

is a measure of asymmetry about the mean. If the variates are dis- 
tributed symmetrically about x then m 3 = 0. But if the positive 
deviations from the mean outweigh the negative deviations then 
M 3 > 0, whereas if the negative deviations predominate, then M 3 < 0. 
Cubing the deviations gives a measure which is sensitive both to their 
size and sign but the result is in cubic units. Now symmetry, or lack 
of it, is not a function of the original units of measurement, so if we 
divide M 3 by <r 3 we get a pure number. Thus a 3 is a satisfactory meas- 
ure for comparing symmetry in distributions of different units of 
measurement. 

The quantity a 4 measures a characteristic called “ kurtosis.” It 
refers to the relative number of variates in the vicinity of the mean. 
More will be said about a 3 and a 4 later on. At this time, emphasis 
should be placed upon their calculation rather than upon the infor- 
mation which they yield. 

Inasmuch as the as are independent of the unit of measurement, 
they may be computed from the moments in the u unit. Changing 
these moments into the x unit would only introduce the same factor 
into the numerator and denominator, which would of course divide 
out. Thus: 

M3:s C 3 M3 :w M3:ti 

3 CTx cW a u z 

M 4:x C^fJL 4 :x M4 :u 

<7 X 4 CV U 4 (Tu 
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For Table 18 we have 

az = 


a 4 = 


■1.320' 


(1.72) (1.31) 
9.4096 


(1.72) 2 


3.18, 


—0.586, 


Although no limits can be placed on the possible values which a z 
and on may take, it may be said that for the more common distri- 
butions 04 fluctuates around 3 and o z is usually not more than 2 nor 
less than —2. We cannot go into the theoretical reasons for these 
values and we mention them here merely to guide the student as to 
what is a reasonable result to expect in the exercises in this book. 
In this connection, the inequality 1 

04 oz 2 + 1 

may also prove useful. When the numerical value of o z is large, the 
distribution may be of the J-shaped type which is an extreme form 
of the asymmetrical type. However, these types cannot always be 
distinguished by elementary methods if the original data are not 
available. 

9. Summary. The quantities x, a x , a 3 , and 04 are called the de- 
scriptive constants of the distribution. They (together with N) are 
the “ relatively few quantities ” (§1) which, in certain cases, con- 
tain all the relevant information in the distribution. Table 20 will 
serve as a model for the procedure which the student should follow 
in computing these quantities. Of course, if the work is done on a 
computing machine, only the totals of the power sums need be re- 
corded. The detail of the columns may be omitted. In Table 20, 
c — 1, so <r x = cr u . Obviously, this would not be true in general. 

The calculation of the v’s proceeds naturally as an extension of the 
work required to compute x for a frequency distribution. Thus to 
obtain x we first compute v 1:u and then obtain x from the relation 


x = cu + £ 0 . 


To obtain the standard deviation we need the value of r 2 because cr x 
is found from the relations 


= V2 “* U 2 



1 “ A Note on Skewness and Kurtosis ” — J. Ernest Wilkins, Jr. Annals of 

Mathematical Statistics , vol, 15 (1944), pp. 333-335. 
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The next chapter is devoted to a discussion of dispersion of which <r x 
is a measure. To be sure, the standard deviation is only one of several 
measures of dispersion, just as the mean is only one of several aver- 
ages. But both the mean and the standard deviation play important 
roles in the theory and practice of statistics. It is important to 
master the pattern by which they are computed in a frequency 
distribution. 

In order to compute a 3 and a 4 we first require v z and v 4 (in addition 
to v\ and z^). Then n z and im are obtained from (5) and (6). Finally, 

Vr:u 


is computed for r = 3 and r = 4. The characteristics of a distri- 
bution which a z and a 4 describe will be discussed in Chapter VI and 
again in Part II. In elementary work they are less important than 
x and <? x . 

With regard to the number of decimal places to be retained in 
computations, the author agrees with Dr. Shewhart who says: “It 
does not appear feasible ... to lay down simple, practical, and in- 
fallible rules.” Reasons in support of this opinion are stated in his 
book, 1 pp. 79-80. For other remarks in this connection, the reader 
is referred to the books by Walker and Scarborough which are cited 
in our Introduction. 


Exercises 

1. (a) What is the numerical value of the mean of any distribution of variates 

expressed in t units? 

(6) What is the standard deviation of such a distribution? Hint: <rt ~ V a2# 

2. (a) Show that (x — x) = c(u ~ u) and hence that t — (u — u)/<r u . 

(6) Show that we obtain the same results for the a’s if we take 

u — u 

t 

<r u 

3. Prove: If any constant is added algebraically to each variate of a series the 

values of Mr for the new series will be identical with the corresponding values 
of Mr of the original series. 

4. Suppose each variate is multiplied by a constant. What effect would this 

have on x, cr x , a 3 , and 0:4? 

1 See footnote, p. 52. 
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5. Show that the standard deviation of z may be written 

[ 1 l l/2 

jjy S/ife — ^)j 

[ 1 "U/a 

jfUaf-vj * 

6. Prove the general relation 

Hr-x — C r J^r'u 

of which the relations given in (7) are special cases when r = 2, 3, 4. 
Hint: (x — x) = c(u — u). 

7. (o) Show that a 0 = 1. 

(b) Show that ar r = (/x 2 ) r/2 in both the x and u units. 

8 . Prove from (4) that m is less than or at most equal to p 2 , the same unit being 
used in each case. 

9. Find x, a x , a z , and <u for Iowa City rainfall using your results from Prob- 
lem 4 of the preceding set of Exercises. 

Ans. 

x = 2.80 in. az — 1.29, 

<r x = 2.01 in. <24 = 4.58. 

10. Using Table 20 as a model find x, <r x , az, and a 4 for the distributions in §11, 
Chapter I, according to the direction of the instructor. 
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Table 20 — Specimen Worksheets for Computing the Characterizing 
Constants of a Distribution 


Subject: Span among Adult Males (Table 13) 


X 

/ 

u 

uf 

u 2 f 

u 3 f 

U A f 

( « + 1 Yf 

58.5 

1 

-11 

- 11 

121 

-1,331 

14,641 

10,000 

59.5 

2 

-10 

- 20 

200 

-2,000 

20,000 

13,122 

60.5 

1 

- 9 

- 9 

81 

- 729 

6,561 

4,096 

61.5 

6 

- 8 

- 48 

384 

-3,072 

24,576 

14,406 

62.5 

7 

- 7 

- 49 

343 

-2,401 

16,807 

9,072 

63.5 

22 

- 6 

-132 

792 

-4,752 

28,512 

13,750 

64.5 

55 

- 5 

-275 

1,375 

-6,875 

34,375 

14,080 

65.5 

111 

- 4 

-444 

1,776 

-7,104 

28,416 

8,991 

66.5 

146 

- 3 

-438 

1,341 

-3,942 

11,826 

2,336 

67.5 

182 

- 2 

-364 

728 • 

-1,456 

2,912 

182 

68.5 

229 

- 1 

-229 

229 

- 229 

229 

0 

69.5 

265 

0 

0 

0 

0 

0 

265 

70.5 

263 

1 

263 

263 

263 

263 

4,208 

71.5 

217 

2 

434 

868 

1,736 

3,472 

17,577 

72.5 

176 

3 

528 

1,584 

4,752 

14,256 

45,056 

73.5 

132 

4 

528 

2,112 

8,448 

33,792 

82,500 

74.5 

82 

5 

410 

2,050 

10,250 

51,250 

106,272 

75.5 

48 

6 

288 

1,728 

10,368 

62,208 

115,248 

76.5 

20 

7 

140 

980 

6,860 

48,020 

81,920 

77.5 

16 

8 

128 

1,024 

8,192 

65,536 

104,976 

78.5 

12 

9 

108 

972 

8,748 

78,732 

120,000 

79.5 

3 

10 

30 

300 

3,000 

30,000 

43,923 

80.5 

1 

11 

11 

121 

1,331 

14,641 

20,736 

81.5 

2 

12 

24 

288 

3,456 

41,472 

57,122 

82.5 

1 

13 

13 

169 

2,197 

28,561 

38,416 

Sums 

2,000 


886 

19,802 

35,710 

661,058 

928,254 

(Sums)/2V' 



.443 

9.901 

17.855 

330.529 





u 

V2 


Vi 



Charlier’s check: 

Y ( u + !) 4 / - 1 l ^ 4 / + ^Y uZ f + §Y u2 f + 4Y u f + 51 / 

928,254 « 661,058 + 4(35,710) + 6(19,802) + 4(886) + 2,000 - 928,254 


78 


Moments 


IV 


Computations: 

x = cu -f so = (1) (.443) + 69.5 = 69.943 in. 
u 2 = .196249 

w 3 = .086938, w 4 = .038514 
=s ~ W 2 

= 9.901 - .196249 - 9.704751 
<T U = V9.704751 = 3.115 
<r» = c<r„ = (1) (3.115) = 3.115 in. 


nz — v% ~ 3 v 2 u + 2u s 


= 17.855 - 3(9.901) (.443) + 2 (.086938) 
= 17.855 - 13.158429 4* .173876 
= 4,870447 


Hi = ^4 — 4*|B + 6r; 2 ^ 2 — 3tZ 4 

= (330.529) - 4(17.855)(.443) + 6(9.901)096249) - 3(.028514) 
= 330.529 - 31.639060 + 11.658368 - .115542 
- 310.432769 

(Tti 3 « (3.115) (9.704751) = 30.230299 


■~ 4 = (9.704751) 2 = 94.182192 

4.870447 


<*3 


30.203299 

310.432766 

94.082192 


= .161 


= 3.296 


Summary: 


x = 69.943 in. ; 
ox = 3.115 in.; 


a 3 - 0.161; 
.<*4 = 3.296. 


10. Sheppard’s Corrections. The moments of a frequency dis- 
tribution are computed on the assumption that each variate value in 
a class interval has the value of the class mark for that interval. This 
has the effect of replacing the actual data by somewhat fictitious data 
assigned arbitrarily at the central values of the intervals. Evidently 
a very coarse grouping might be misleading and it can be shown math- 
ematically that the above assumption introduces a systematic error, 
called a grouping error, in the results obtained for the second and 
fourth moments about the mean but does not affect jxi and \x z . To 
eliminate this systematic tendency certain corrections are applied to 
fx 2 and fx 4. 

The derivation of these corrections is beyond the scope of an ele- 
mentary course, but it may be worth while to see why it is that cor- 
rections are necessary for some moments and not for others. The 
following argument is intended only as a pedagogical device to give 
a plausible explanation. Suppose a smooth curve represents the 
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true frequency distribution while the histogram represents the dis- 
tribution with class marks as the variates. Since the moments are 
computed from the distribution represented by the histogram, we 
scarcely expect our results to be exactly the values of the moments 
of the true distribution, which are, of course, what we seek. In using 
the distribution represented by the histogram, we are neglecting, for 
each rectangle, the little area under the curve shaded A and sub- 
stituting for it the little area shaded B . Suppose that, in general, 
B is a little larger than A, as shown in Figure 12. The excess of B 



over A for those rectangles to the left of x will be negative; the cor- 
responding excess for those rectangles to the right of x will be posi- 
tive. This may be readily understood by considering these little areas 
as approximate triangles whose bases are negative or positive accord- 
ing as they are to the left or right of x. These excesses for all the rec- 
tangles, both positive and negative, are involved in taking the sum- 
mation ~~ X Y f° r moments. When r is an odd number, 

as 1 or 3, the excesses show up with their algebraic signs and there- 
fore, over the range of the distribution, the positive excesses just 
about offset the negatives ones. But in the case of the even moments, 
all the excesses now become positive so that the errors accumulate 
and the final results for these moments are too large. 

To reduce these errors due to grouping, W. F. Sheppard has demon- 
strated 1 that the following corrections should be applied. It should 

1 Students familiar with more advanced mathematics will find an interesting 
discussion of systematic errors and references to papers dealing with Sheppard's 
corrections in an article by H. C. Carver, Annals of Mathematical Statistics , 
vol. 7, p. 154. 
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be noticed that as we state them here they should be applied only 
where the class interval is unity, i.e., in the u unit. 


Corrected n2-.u = uncorrected 
Corrected /x s:u = uncorrected 


12 


Corrected m :u = uncorrected — 


- (uncorrected /i 2:u ) + — 



0.02917 



Example . For Table 18 we have 


Corrected ju 2 : u = 1.720 — 0.083 = 1.637 
<r« - VF637 = 1.28 
Corrected <r x — 10(1.28) = 12.8% 
Corrected w. u = 9.4096 - (1.72) /2 -f 7/240 
= 8.5788 


<*4 = 8.5788/ (1.637) 2 = 3.20 


The values of x and ju 3 remain unchanged. 


Sheppard’s corrections are valid only for the bell-shaped types of 
distributions. They are not applicable to the J-shaped or U-shaped 
types. Moreover, they constitute a refinement which may not al- 
ways be consistent with the degree of accuracy in the original data. 
The errors of grouping (not mistakes) are usually small compared 
with the errors existing in the raw data. So, it seems that little 
would be gained by their use in a first course. We will occasionally 
use them in an illustration. 



CHAPTER V 

MEASURES OF DISPERSION 


1. Introduction. The concept of variability is fundamental today 
not only in the social sciences but also in the so-called exact physical 
sciences. Modern scientific method recognizes the existence of 
physical, moral, and mental inequalities. The principle of variabil- 
ity has come to be accepted as the natural order in social, economic, 
and physical phenomena. This principle is the very essence of the 
statistical nature of mass phenomena. In this connection, R. A. 
Fisher says: 1 

The conception of statistics as the study of variation is the natural outcome of 
viewing the subject as the study of populations; for a population of individuals 
in all respects identical is completely described by a description of any one indi- 
vidual, together with the number in the group. The populations which are the 
object of statistical study always display variation in one or more respects. 
To speak of statistics as the study of variation also serves to emphasize the 
contrast between the aims of modern statisticians and those of their predecessors. 
For, until comparatively recent times, the vast majority of workers in this field 
appear to have had no other aim than to ascertain aggregate, or average, values. 
The variation itself was not an object of study, but was recognized rather as a 
troublesome circumstance which detracted from the value of the average. . . . Yet, 
from the modern point of view, the study of the causes of variation of any vari- 
able phenomena, from the yield of wheat to the intellect of man, should be 
begun by the examination of the variation which presents itself. The study 
of variation leads immediately to the concept of a frequency distribution. 

It is clearly important, therefore, in studying a distribution, to 
describe how the variates are clustered or scattered around an aver- 
age. Figure 13 shows how two distributions may even have the same 
mean and total frequency, yet differ considerably in variation from 
the mean. Such variation is commonly called dispersion, varia- 
bility, or spread. 

We will consider three measures of dispersion: Quartile Deviation , 

1 R. A. Fisher, Statistical Methods for Research Workers , p. 3. Oliver and Boyd, 
London. 
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Mean Deviation , and Standard Deviation, of which the last is by far 
the most important. 



2. The Quartile Deviation. Just as the median selects one point 
of division, we may now take two additional points such that they, 
together with the median, divide the whole distribution into four 
equal parts. These points are called the quartile values. 

The first quartile, denoted by Qi, is that value of x for which 
cum f — N /4c. That is, one-fourth of all the variates in the distribu- 
tion are smaller in value than Qi and three-fourths of them are larger 
than Qi. The second quartile Q 2 is that value of x for which cum f 
is N/2 and is therefore the median. The third quartile, denoted 
by Qz, is that value of x for which cum f — 3iV/4. Hence fifty per 
cent of the total frequency is included between Qi and Q z . 

Half of the distance between Q z and Qi is called the semi-inter - 
quartile range or quartile de- 
viation and will be denoted 
by Q. Thus, 

(« Q- 9 — 9 '- 



It should be noted that 
the median does not neces- 
sarily come at the mid-point 
of 2 Q, i.e., that a distance 
Q laid off on either side of 
Q 2 would not necessarily reach to Qi and Qz . (See Figure 14.) (For 
a symmetrical distribution, to be considered later, this would 
be true.) 

As a measure of dispersion, Q gives a fairly good idea of the spread 
of the variates, and is suitable as such a measure in those cases where 
the median would be used as an average. The quartile values Qi 
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and Q s are found, like the median, by interpolation in the cumulative 
frequency table. 

Example, (a) Find the median and the quartile deviation for the distribution 
of IQ’s in Table 6 (§10, Chapter I). (6) Illustrate the measures found in (a) by 

means of a cum f graph. 


End-x 

Cum f 

. .54.5 : 

0 

64.5 

3 

•_ 74.5 

24 

84.5 

102 

<- Qi 


94.5 

284 

Med. 


104.5 

589 

•<-Qz 


114.5 

798 

124.5 

879 

134.5 

900 

144.5 

N = 905 


Solution: 

N / 4 = -226.25, N/2 = 452.5, 3M/4 = 678.75 

10 284 - 102 


Q 2 - 94.5 452.5 - 284 

10 “ 589 - 284 * 


Q 2 '= 100.02 


Qs - 104.5 678.75 - 589 

10 “ 798 - 589 



8.75. 


Qz = 108.8 


Figure 15 explains graphically the measures obtained by inter- 
polation from a cum f table. For convenience in drawing the figure, 
the quartile labels are put on vertical lines. But one should remem- 
ber that the quartiles are values of x and that it is the horizontal 
distances of the lines from the y-axis that represent these measures. 
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Exercises 


1. Criticize the following “ definitions 

„ N ^ N „ 
Qi = - , Qi * ~ , Qs 


3 N' 
4 * 


2. Find Qi and Qz from the cumulative frequency table which you made to 

obtain the median for the Glasgow schoolgirl distribution. (Exercise 5 
on page 52.) 

3. Find the quartile deviation Q from your results in Exercise 2. 

4. Find Q h Q s , Qz for the distribution in Table 12, and compute Q. 

5. Compute the value of the semi-interquartile range for other distributions 

at the direction of the instructor. 





6. The mth percentile Pm of a frequency distribution is that value of the vari- 
able x for which cum / = mN / 100, where m — 1, 2, • * 99. The 10th, 

20th, 30th, • • ♦, percentiles are called deciles. Therefore, the nth decile 
D n is that value of x for which cum / = nN / 10, where n = 1, 2, • • •, 9. 
Compute several percentiles and deciles of a distribution in the text. 

3. Mean Deviation. As a measure of variation about a central 
value, it would seem appropriate to take an average of all the devia- 
tions about that central value. In the mean deviation (MD) about 
the mean this is precisely what we do, namely, we find the arithmetic 
mean of the numerical values of the deviations about the mean. 
In su mm ing the deviations, their absolute values are used because 
regardless of whether deviations are positive or negative they have 
the same influence on the amount of variation. Moreover, if their 
algebraic signs are taken account of, the sum of such deviations is 
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zero (Theorem VI of Chapter III). Hence we sum them treating 
all deviations as positive. 

In mathematical symbols, vertical bars denote absolute values, 
so we have 1 

(2) MD=A£/ i |* f -*|, 

if the x unit is used. When the class interval is the unit, we have 

(3) MD = ^E/ < | Ui -fl| 

and 

(4) MD (x unit ) = cX MD (u unit). 

It can be proved that the essentially positive function 
y = - Ay 

is a minimum when A = x. (See Theorem II, page 99. Also by 
the calculus dy/dA = 0 when A = x.) It was in a similar investi- 
gation to find the value of B for which the function 

y = - b\ 

is a minimum, that the median was discovered. When B is the me- 
dian this function is a minimum. 2 This property of the median has 
some statistical importance in connection with the geographical 
location of centers of industry and population. 3 Custom has estab- 
lished the use of the mean rather than the median in this measure. 
Hence “ mean deviation ” usually refers to the mean deviation from 
the mean. It is also called “ average deviation.” 

1 Since all the data are not concentrated at the midpoints of the intervals, a 
grouping error is involved here as in the formula for a- (§10, Chapter IV). But 
the mean deviation is used so infrequently that discussion here of the appropriate 
correction hardly seems warranted. Those who may be interested will find a 
more precise formula in the Handbook of Mathematical Statistics — Rietz and 
others. 

2 For a proof see reference 16, our Introduction. 

3 See p. 85 of Elements of Statistics — Davis and Nelson. Principia Press. - 
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Example. Find the mean deviation for the grades in Table 18 where the 
mean value of x is 72.5. 


X 

/ 

| x — x | 

f\x-x\- 

34.5 

2 

38 

76 

44.5 

3 

28 

84 

54.5 

11 

18 

198 

64.5 

20 

8 

160 

74.5 

32 

2 

64 

84.5 

25 

12 

300 

94.5 

7 

22 

154 

Total 

100 


1036 


MD = ^ = 10.36. 
100 


What was <r for this distribution? 


The absolute value of a variable x\ denoted by the symbol \x'\, is 
not very tractable in mathematical operations. Therefore the mean 
deviation is not favored by mathematicians since it is unwieldy in 
the more theoretical and mathematical discussions. Its chief use 
is in experimental work where occasional large and erratic deviations 
are likely to occur. In such cases the standard deviation would tend 
to emphasize these deviations. 

If m of the N variates are greater than the mean, £, then the mean 
deviation may be written 

MD = ~ | (sum of variates greater than x) — mx | 



The student is given a hint in Exercise 34 at the end of Part I on 
how to prove a similar formula for Xi < x. 

4. The Standard Deviation. To overcome the difficulty of nega- 
tive deviations and the use of absolute value signs, the deviations 
about the mean may be squared and the mean of these squares taken. 
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To get back into the original linear units, we take the positive square 
root of this result, and have 



as defined before. The standard deviation measures the same kind 
of phenomenon as the mean deviation and this approach to it is 
frequently satisfactory to a student who otherwise finds it difficult 
to understand. 1 

For a common type of distribution, the standard deviation is 
approximately twenty-five per cent greater than the mean deviation. 
Speaking more accurately, this is true of a normal distribution (to be 
considered in Chapter VI) for which the relation is MD = | a 
(approximately). 

It is often convenient to have a name for “ the square of the 
standard deviation,” and for this purpose the term “ variance ” has 
been introduced. Thus <r denotes standard deviation and a 2 de- 
notes variance. 

• Although definition (5) is the basic concept which the student 
should have for the standard deviation, nevertheless in actual prac- 
tice it is seldom desirable to compute cr directly from that definition. 
For a frequency distribution the method is shown in the chapter on 
moments. However, we will give an additional illustration here. 

Example. Find the mean and the standard deviation of Table 9, using 
Charlier’s check and Sheppard’s correction. 

Solution: (See Table 21, p. 88.) 

Charlier’s check: + l) 2 = YLf u2 + + V 

2471 - 2365 + 2 (-447) + 1000 - 2471 

Computations : 

x = 49.5 + 4 (-.447) = 47.712 lbs. 
ut'u = vz — ( u 2 ) — 2.165 

1 The term “ standard deviation ” was proposed by Pearson and is now used by 
almost all English writers. As originally defined by Pearson, this is the square 
root of the mean of the squares of deviations taken from the mean of the distri- 

bution, , and is not to be used when deviations are measured from any other 
reference point. Pearson uses the term “ root-mean-square ” for a similar 
measure when the deviations are taken around any origin other than the mean. — 
Walker, History of Statistical Method , p. 54. 
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Using Sheppard’s corrections, 

Corrected « = 2.165 - .083 = 2.082 
= 16(2.082) = 33.312 
<x x = V33.312 = 5.772 lbs. 


Table 21 — Weights of Glasgow School Children 


Weight ( x ) 

/ 

u 

fu 

fu 2 

f(u + 1)* 

29.5 lbs. 

1 

—5 

- 5 

25 

16 

33.5 

14 

-4 

- 56 

224 

126 

37.5 

56 

-3 

— 168 

504 

224 

41.5 

172 

-2 

-344 

688 

172 

45.5 

245 

-1 

-245 

245 

0 

49.5 

263 

0 

0 

0 

263 

53.5 

156 

1 

156 

156 

624 

57.5 

67 

2 

134 

268 

603 

61.5 

23 

3 

69 

207 

368 

65.5 

3 

4 

12 

48 

75 

Sums 

1000 


-447 

2365 

2471 

(Sums) /2\T 

1 


-.447 

2.365 





u 

v* 



It will be proved later that for a certain ideal type of distribution 
which is often approximated in practical statistics the range x db a x 
includes about two thirds of the variates. Assuming the above 
distribution is of this type we could say that about two thirds of the 
children weighed between 42 pounds and 53.5 pounds. Such a state- 
ment assists one in comprehending certain characteristics of the data 
though the distribution actually may not be before him. 

It is understood that the method of computation described above 
is to be used when the class marks are equispaced. If the class 
intervals are unequal we must choose c = 1 unless the x’s denoting 
the class marks have a common factor c. When c = 1, u becomes 
u = x — x 0 , and the work may be simplified a little by an appropriate 
choice of £ 0 . 
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Exercises 

1. (Pearson). The following data represent the percentage of ash-content in 
280 wagon tests of a certain kind of coal. Find the mean and the standard 
deviation of the distribution: 


Ans. x 


Percentage 

Ash-Content 

Frequency 

3.0- 3.9 

1 

4.0- 4.9 

7 

5.0- 5.9 

28 

6.0- 6.9 

78 

7.0- 7.9 

84 

8.0- 8.9 

45 

9.0- 9.9 

28 

10.0-10.9 

7 

11.0-11.9 

2 / 

1.36%. 

/<& 

/V 


2 . 


(Camp). 

data: 


Find the mean wage and the standard deviation of the following 


Class 

Frequency 

14.50- 5.99 

43 

6.00- 7.49 

99 

7.50- 8.99 

152 

9.00-10.49 

178 

10.50-11.99 

160 

12.00-13.49 

40 

13.50-14.99 

25 , 

15.00-16.49 

3 r 

$9.42, < 7 * = $2.19. 

-- 


Ans. N = 700, x 

3. Given <r x — 2.19 for the following (x, /) distribution, find <r v and <r u for the 
0 v , /) and (u, f ) distributions, respectively. 


f 

43 

99 

152 

178 

160 

40 

25 

3 

X 

0 

1.5 

3.0 

4.5 

6.0 

El 


10.5 

V 

0 

1 

2 

3 

4 

m 

6 

7 

u 

-3 

—2 

-1 

0 

m 

2 

3 

4 


What relation and theorem in Chapter IV does this illustrate? 

4. Find the variance <r x 2 of Table 16 (§ 8 , Chapter III). 

5. Compute the value of the ratio MD/a for the data in Exercise 1 above. 

6 . Find the mean and standard deviation for the data in Table 10. 
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7. Find the mean and standard deviation for the data in Table 11. 

8. Transform the variates of the following distribution into standard units: 


! X 

2 

4 

6 

8 

10 

12 

14 

16 

18 

20 

f 

1 

9 

36 

84 

126 

126 

84 

36 

9 

1 

t 

Some answers: 


1/3 

1 

5/3 

7/3 

3 


6. Relative Dispersions. The full significance of different values 
of <t can be obtained only by experience, but it is obvious that a small 
standard deviation indicates that the variates are closely clustered 
about the mean; whereas a large standard deviation indicates that 
these values are spread out widely from the mean. (See Figure 13.) 

The size of variates usually influences not only the mean but also 
deviations from the mean. In other words, the magnitudes of the 
deviations from the mean seem to be dependent, in some degree, upon 
the magnitude of the mean. In comparing dispersion in distribu- 
tions, we may correct for differences in the average magnitudes of 
positive variates by taking the ratio of the standard deviation to the 
mean. Thus, the quantity 

( 6 ) 7 = 5 

X 

is known as the coefficient of variation. It is obviously an abstract 
number, being independent of the units of measurement, and it is 
usually expressed as a percentage. 

The use of (6) may be misleading in situations where the origin 
from which the data are measured is somewhat arbitrary. Cases 
in point are temperature measurements and certain psychological 
data. Further discussion of such limitations of (6) will be found in 
references 2, 14, and 15, listed in the Introduction. 

6. Scaling a Distribution in Terms of c r. Suppose we lay off 
intervals of length <r on either side of the mean (Figure 16). Then 
for a certain type of distribution known as the normal curve (which 
will be considered in the next chapter) the following properties can 
be proved: 

(1) The percentage of the total frequency lying outside the range 
x d= a is 32% approximately. 

(2) The percentage outside x db 2<r is 5% approximately. 
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(3) The range x db 3c includes practically the whole distribution, 
i.e ., the total range is 6c approximately. 

The student will recognize that these ranges are, in standard units, 
t = ±1, t = d=2, 2 = ±3, respectively. These results follow from 
the relation 

x — x 

t = y X = x + ter. 

- c 


Sometimes it is important in a statistical analysis to know how 
nearly the given variates are distributed in accordance with the 



Fig. 17 — Distribution of Table 21 Scaled Off in Units of a 


above property of the normal curve. The distribution of Table 21 
has been scaled off in this manner, with the results shown in Table 
22v - Figure 17 will be helpful in verifying them. 

We will verify here the 34.8% given in Table 22, and the student 
is asked to verify the others in Exercise 2. The range x =fc c (Figure 
17) evidently includes all the- variates represented by the two central 
rectangles and proportionate parts of the two adjoining rectangles. 
From 39.50 to 41.94 is 2.44, and since the variates are assumed to be 
uniformly distributed over the class interval we have 172(2.44/4) = 
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104.92 for the proportionate number to be excluded in the class 
39.5-43.5. Hence the number below x — a is (1 + 14 + 56 + 
104.92) = 175.92. Similarly, from 53.484 to 55.5 is 2.016, and we 
have 156(2.016/4) = 78.624 as the proportionate number excluded 
in the class 51.5-55.5. Hence the total above x + <r is (78.624 + 
67 + 23 + 3) = 171.624. So the total number outside x ± a is 
(171.624 + 175.92) = 348 or 34.8% of the 1000 variates. This re- 


Table 22 — Results of Scaling Off Table 21 


x = 47.712 
o'* = 5.772 

Range 

Frequency outside the 
given range 

Number 

Percent 

x — a = 41.940 x + a = 53.484 

X cr 

348 

34.8 

x - 2cr = 36.198 x + 2<r = 59.256 

X dr 2cr 

60 

6.0 

x - 3<r = 30.396 x + 3a- = 65.028 

X ± 3<r 

3 

0.3 


suit could also be obtained as follows : By forming a cum f table and 
interpolating in the end x column we find 


cum f at x = 53.484: 828 

cum f at x = 41.940: 176 

Number in the (x ± a x ) interval: 652 

Number outside this interval: 348 


7. Semi-interquartile Range in Terms of or. The range (Qz — Qi) /2 
when expressed in units of a has a significance in a normal distribu- 
tion, as will be shown later. We will denote this by s; hence 



, Q 

and s = — • 

<7 


For the present we merely calculate its value in the exercises below. 


Exercises 

1. Find the mean and standard deviation for the distribution of Lengths of 

Telephone Calls, given in Table 8 (Chapter I). Use Charlier’s check. 

2. In the three distributions named, show that the percentages outside x + ta for 

t = =fcl, =b2, and ±3, are as stated in Table 23. Verify also the values of s. 
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Table 23 


Distribution 

N 

Percent Outside 

s 

5 zt c 

X db 2<r 

X db 3cr 

Glasgow girls 

1000 

34.8 

6.0 

0.3 

0.675 

Telephone calls 

995 

32.7 

5.0 

0.4 

0.69 

Span 

2000 

31.8 

4.2 

0.5 

0.665 


8. N Small. Ungrouped Data. When N is small it is seldom de- 
sirable to attempt an arrangement of the variates into a frequency 
distribution. Moreover, in this case, the values of a 3 and are not 
usually needed because the applications of these measures relate to 
characteristics of large distributions. Therefore, only the mean and 
standard deviation are usually required for a small set of ungrouped 

Table 24 — Average Yields of Corn in Bushels per Acre 
for a Certain Section in Illinois from 1901-1920 


i 



Year 

Yield (x) 

u 

u 2 

1901 

21 

-15 

225 

1902 

39 

3 

9 

1903 

32 

- 4 

16 

1904 

37 

1 

1 

1905 

40 

4 

16 

1906 

36 

0 

0 

1907 

36 

0 

0 

190$ 

32 

- 4 

16 

1909 

36 

0 

0 

1910 

39 

3 

9 

1911 

33 

- 3 

9 

1912 

40 

4 

16 

1913 

27 

- 9 

81 

1914 

29 

- 7 

49 

1915 

36 

0 

0 

1916 

30 

- 6 

36 

1917 

38 

2 

4 

1918 

36 

0 

0 

1919 

36 

0 

0 

1920 

35 

- 1 

1 


Totals 


N - 20 


-32 


488 
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data. The following methods will help the student become familiar 
with the several formulas for a, which may be used in this case. 

Method L The indirect method involving the u unit may still be 
used for finding the first and second moments. Since each variate 
is being treated separately / = 1, and we compute the values of 


v r =? r; T ]u r for r = 1 and 2. 
: N 


If the values of x are unequally spaced 


we take c = 1 and let u - x — x 0 which changes the origin but not 
the units. In other words, the procedure is the same as for a fre- 
quency distribution except that / = 1 and c — 1. 


Example. Find the mean and standard deviation for Table 24, N = 20. 
We choose x 0 == 36. 


Table 25 


X 

IR 

J 

II 

Z ' 2 

21 

-13.4 

179.56 

. 27 

- 7.4 

54.76 

29 

- 5.4 

29.16 

30 

- 4.4 

19.36 

32 

- 2.4 

5.76 

" i 

32 

- 2.4 

5.76 

33 

- 1.4 

1.96 

' 35 

0.6 

.36 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

37 

2.6 

6.76 | 

38 

3.6 

12.96 

39 

4.6 

21.16 

39 

4.6 

21.16 

40 

5.6 

31.36 

40 

5.6 

31.36 

• 688 

' £ \x'\ = 73.6 

436.80 
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Computations : 

32 

v\ = U = — ~ = — 1 . 6 ; X = Xo + £ - 36 — 1.6 

= 34.4 bushels 
488 

V 2 = — — 24.40; M 2 = V 2 — u 2 — 21.84. 

Therefore, 

cr» ” (T u == V21.84 = 4.67 bushels. 

Method II. When / = 1, formula (5) becomes 

(7) <r* = |j^2D(z< - z) 2 J > 

and sometimes it is best to compute the standard deviation directly 
from this definition, without the use of the u unit. Thus the origin 
is placed at the mean and all indirect methods are abandoned. If 
the mean deviation is also desired, clearly this method should be 
used. It is exemplified in Table 25 for the preceding example, and 
the variates have been arranged in order of magnitude. 


x = = 34.4 bushels 

20 

9 436.80 01 

’• - ~W - 21 84 

a x — 4.67 bushels 

MD = ^ S I ^ = 3 - 68 bushels. 


Method III. From the relation 

M2 = v 2 — (Vi) 2 


we have 


/*« = v 


when / = 1. Therefore <x may be written 



( 8 ) 
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Table 26 


X 

x 2 

21 

441 

27 

729 

29 

841 

30 

900 

32 

1024 

32 

1024 

33 

1089 

35 

1225 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

37 

1369 

38 

1444 

39 

1521 

39 

1521 

40 

1600 

40 

1600 


688 24,104 


This method is perhaps the best when the values of x are not large or 
when a table of squares is available. It is illustrated below for the 
preceding example. (See Table 26 .) 


Computations: 


1 v-* 688 . . 

x - — >,x = — = 34.4 bushels 
N 20 


x 2 - (34.4) 2 = 1183.36 
24104 

1205.20 


I > 2 « 


JL 

20 

’<r, = [1205.20 - 1183.36] 1/2 
* (21.84) l/2 
= 4.67 bushels. 


Miscellaneous Exercises 

1. (a) Verify that the algebraic sum of the numbers in the x* column of Table 25 
is zero. 

(6) Verify the value of mean deviation given for Table 25. 
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2. Using your own judgment as to the most appropriate method, find the mean 
and standard deviation for each of the two sets of data, x x and x 2 : 



Answers 

Xi 

88 

95 

68 

73 

75 

88 

57 

68 

62 

79 

73 

74 

78 

2i = 69.80 

80 

57 

65 

69 

74 

78 

72 

59 

47 

56 

67 

43 


<n = 12.13 

x% 

82 

86 

75 

78 

72 

79 

63 

65 

67 

75 

68 

70 

79 

x 2 — 67.64 

78 

51 

58 

65 

69 

68 

83 

80 

42 

43 

48 

47 


cr% ~ 12 . 68 


3. Complete the computations and find the mean and variance of the following 
distribution: 



Hint Here we let v — y — y 0 . Then y =* $ + y 0 , and <r y 2 = a v 2 since c = 1. 

(See Theorem on p. 69.) 

Ans. y — 87.31, <r y 2 = 56.66. 

4. Data have been gathered showing the points scored on a mental test by 
290 prospective employees and the per cent of standard production 
attained by these same 290 persons after being employed. 1 The following 
statistics were obtained: 

Mental test: mean = 43.33 pts. 

(t = 9.25 pts. 

Productive ability: mean = 92.02% 
cr = 24.47% 

(а) Compare the relative dispersion in mental test and productive ability. 

(б) What factors, other than mental level, may have affected dispersion 
under factory conditions? 

1 Wembridge, “Experiment and Statistics in the Selection of Employees,” 
Journal of the American Statistical Association , March 1923, p. 605. 
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5. Read and abstract the article “Variability,” Journal of Educational Research, 

vol. 4, no. 3, pp. 221-26. 

6. Find the median for Table 26. 

7. Find x, <r x , MD, and Q for the following distribution. 


mid-x 

2 ‘ 

4 

6 

8 

10 

1 

1 

4 

6 

4 

1 


8. Show that (8) may be written as follows: . 

- d » 2 ] i/2 

" = N 

9. If the variates are all equal, say each Xi — h , show that x ~ k and <r = 0. 

10. For a set of ungrouped data it is found that N — 15, = 480, ^x 2 — 

15,735. Find x and <r x . 

11. Find the variance of the following data. 


5.7 6.2 6.5 6.0 6.3 5.8 5.7 6.0 6.0 5.8 


Ans. v x 2 = .064. 

12. Prove the identity: 

(Xi - 5) 2 + (X 2 - x) 2 + • • • + (x N - x) 2 
= ( Xi 2 + xz 2 + * * • 4- Xtf) — Nx 2 . 

13. Compute the mean deviation (from the mean) for the following data: 


X 

2 

4 

6 

8 

10 

17 

f . 

1 

6 

10 

7 

2 

2 


Ans , . MD * 33/14. 

14. Verify the identity (w r here x is the mean of xi and x 2 ): 

(xi - x) 2 + (x 2 - x) 2 = jfa - x 2 )% 

and thus show that, for two variates, 

\xi ~ 1 

* 2 

15. Verify the identity (where x is the mean of xi, x 2 , x*)i 

3 

3(xi — Xi) 2 + ( xi + x 2 — 22 : 3 ) 2 = 6 (xi — a;)*. 

- • •• 1 ' ■ 






Sec. 9 


The Standard Deviation 


99 


9. The Standard Deviation of the Combination of Sets. The 
following theorems involving a are interesting in themselves and 
have useful applications. 

The relation iai = v% — vi 2 is true in a more general sense than we 
have previously used. Its generalized meaning will be revealed in 
our first , theorem. 

Theorem I. The second moment about the mean equals the second 
moment about an arbitrary point P(x 0 , 0) minus the square of the dis- 
tance between the mean and P. 

Stated in symbols the theorem may be clearer. Suppose we have 
a set of N variates whose mean is x. Graphically, x is a point on the 
x-axis. Then if P is any other point on the x-axis, according to 
Theorem I we have 

(9) XX* ~ 2) 2 = ^ Z(x - Zo) 2 - (x - x 0 ) 2 . 

To prove this relation we may write 

(x — x) = (x — xo) — (x — Xo). 

Then 

^ Z(x - x) 2 == -^ £ [(x - Xo) - (x - Xo)] 2 , 


the right member of which simplifies into the right member of (9). 

The generality of the theorem consists in extending the original 
definition of v% and vi so that they refer to moments about any point 
P on the x-axis (except x ), and not merely about zero. Thus now, 


J>2 


- Xo) 2 . 


If we take x 0 = 0 we have the original defini- 


p 


tion of vz. Also, when P moves 
back to zero, we see that vi be- 
comes x. In other words, the orig- - 
inal definitions of the v’s are merely 
the more general definitions when 
zero is the value chosen for the arbitrary point. (See (la) of Chap- 
ter IV.) 

Theorem II. The sum of the squares of deviations of the variates 
from their mean is less than the sum of the squares of the deviations of 
the variates from any other value . Therefore <j is less than any similar 
“ root-mean-square.” 
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The proof consists in showing that fx 2 < v 2 which is left to the 
student as an exercise. 

Theorem III. Let there he one set of ni variates x u (i — 1, 2, • * •, 
Wi) and another set of n 2 variates x 2 % (i = 1, 2, • * * , n 2 ) and let x be the 
mean of the combined sets (Theorem VIII, Chapter III). The vari~ 
ance a 2 of the set formed by the combination of these two sets is given by 
the following formula: 

ni m 

(10) No 2 = £(*„ - *)* + £(** - x) 2 

i i 

where 

N = ni + n 2 . 

Proof: The proof consists in showing that 

ni 712 ni ~}~7i2 

£(»«■■— 5)* + XC — s) 2 = 2] (z< — s) 2 

i i i 

which is left as an exercise for the student. 

The above theorem is not very important in itself but it is useful 
in proving the next theorem which gives the relation between the 
variance of a composite set and the variances of sub-sets. 

Theorem IV. Let the frequency , mean f and standard deviation be 
denoted by ni, X\ , and ui for one set of variates and by n 2y x 2y and o- 2 for a 
second set The variance a 2 of the composite set is given by the following 
relation: 

Ncr 2 = Uior I 2 + n 2 a 2 2 -f Uidi 2 + n 2 d 2 2 

where 2V = ni + n 2y di = x\ — £ , d 2 = x 2 — x, and x is the mean of 
the composite set. 

Proof: For the ni set, x may be regarded as an arbitrary point P. 
Hence by Theorem I we have 

1 711 1 711 

— * — ^i) 2 = — 2(*« — s) 2 — (Si - s) 2 . 

1 Wi i 

Multiplying through by n x this becomes 

711 

(11) m<ri 2 = £(xii — x) 2 — nidi 2 . 

i 

Similarly for the n 2 group we have 

712 

(12) n 2 o-2 2 = — x) 2 — n^ 2 . 

i 
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Adding (11) and (12), and using (10), we obtain 

Witri 2 + n 2 <r 2 2 = No 2 — nidi 2 — « 2 d 2 2 . 

Hence, 

(13) No 2 = nioi 2 + n 2 cr 2 2 + nidi 2 + n 2 d 2 2 . 


For sets combined into a single set we can generalize (13) into 
the following relation: 

(14) No 2 = XX-tf,- 2 + XX-di 2 


where V = and d* ~ — 5. It is interesting to observe that 

1 k 1 

— y\ndi 2 is the variance of the means of the sub-sets. Thus we have 

NT 

the important relation 

(14a) cr 2 = -~X>i<ri 2 + cr 5 . 2 

i 


which shows that the total variance may be broken up into two parts, 
one of which is the weighted mean of the variances in the sub-sets 
and the other is the variance of their means. These two parts are 
sometimes called the average variance within classes and the variance 
between the means of the classes. They become very important in 
the “ Analysis of Variance ” (which is explained in Part II). 

Corollary I. Equation {13) may be written in the following form: 

(15) No- 2 = ni(oi 2 + xi 2 ) + U 2 (o 2 2 + X 2 2 ) — Nx 2 . 

Proof: Since 

nidi 2 ~ ni{xi — x) 2 = niXi 2 — {2niXiX — nix 2 ) 

and 

n 2 d 2 2 = n 2 (x 2 — x) 2 = U 2 X 2 2 — (2 n 2 x 2 x — n 2 x 2 ) 

the proof consists in showing that the sum of the terms in the end 
parentheses above reduces to Nx 2 . Rearranging these terms their 
sum is 

2x{niXi + ft 2 x 2 ) — x 2 {ni + U 2 ), 


which by Theorem VIII (Chapter III) becomes 


2 xNx — x 2 N — Nx 2 . 
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Generalizing for k groups, (15) becomes 

(16) Nc t 2 = + *< 2 ) ~ Nx\ 

1 

Corollary II. Equation (13) may also be written in the form: 

n x n 2 , 

(17) Na 2 = n\(T i 2 + W + — (x x - x 2 )\ 

The proof consists in showing that 

nidi 2 + n 2 d 2 2 = (x x - x 2 ) 2 . 


This is left as an exercise. 

For purposes of computation, (17) may be more convenient than 
either (13) or (15) because it does not require x, but it does not lend 
itself to a generalization for k sets. Generalizations may be useful 
both for computing and for theoretical purposes. Formula (14) is par- 
ticularly useful in developing the theory of a later section. 

For convenience, the formulas of Theorem VIII, Chapter III, are 
repeated here: 


( 18 ) 


n x xi + n 2 x 2 

. , 

Ui + U 2 


(18 a) 


1 i 

X= Tt -V ^ 

iV i 


k 

XX* 


1 


Theorem V. Consider k sets. Suppose the second moment of each 
set is taken about the mean , x, of the combined sets . Let v 2 {i) represent 
this moment for the ith set Then the variance a 2 for the combined sets 
is given by 

k 

(19) N<r 2 = n x v 2 {l) + n 2 v 2 i2) + • * • + Ukv 2 a) = XX*' 2 (i) 

i 

k 

when ni represents the frequency in the ith set and XX — N. 

l 

Proof: We may write (10) in the form 

Ncr 2 = n x v 2 ^ + n 2 v 2 ^ 2 \ 


So, generalizing this form of (10) for k sets, we obtain (19). 

The next theorem gives the standard deviation of the distribution 
formed by the first N integers, that is, when x = 1, 2, 3 • • *, N. It is 
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useful in eases when the variates are recorded not by measurements 
but by their respective positions when ranked in order with respect 
to some character or property. 

Theorem VI. The standard deviation c of the first N natural num- 
bers is given by 



Proof: By a fundamental definition we have 

1 A I 2 


1 N 


mi 


and by Theorems IV and V of Chapter III, this becomes 


1-2 — i 


i(V+l)(2V+l)-i(V+l)2 


which reduces to 


N 2 


12 


whence we obtain (20). 

10, Graphical Representation. We have shown that, if certain 
statistics are given for two sub-sets, 


Subsets 

ni 

Xi 

<? 1 

712 

X2 

<T2 

Composite set 

N 

X 



the corresponding statistics for the composite set may be obtained 
by means of (13) and (18a). We have been thinking of these statis- 
tics as relating to distributions in the ^-direction. The following 
diagrams show how the means and standard deviations of three such 
distributions may be represented geometrically by the points whose 
ordinates are zero and whose abscissas are, respectively, x h (x x ± o ^) ; 
x 2 , (x 2 ± <r 2 ); and x , ( x ± <r x ). The points are plotted on three 
different axes to avoid confusion, but they are to be thought of as 
being referred to the same origin and plotted on the same scale. 



104 


Measures of Dispersion 


V 


It should be clear that Theorems I-IV (§9) will apply to distribu- 
tions in the y-direction as well as in the ^-direction. In particular, 
it is obvious that (13) and (18a) hold if we replace x by y. Then 
the graphical representation of the means and standard deviations 


Sub-sets j 

Composite set 

ni 


n t 

N 



y 2 

y 

o-i 


<T2 



is shown below. 


y\ _ 

+ y+<r v 


It will be helpful to discuss one more notion in this connection. 
Suppose the y composite set is made up of k sub-sets and the means 
Vu j/ 2 , • • • , fjc, of these sub-sets are plotted on the y - axis as shown 
by the labels on the left side of the axis in the figure on page 105. 

We will denote the standard deviation of these means by <r?.. 
Then the points y, (y ± cr^), and (y ± cr„), may be plotted as shown. 
We would expect less variability among the means of the sub-sets 
than among the y’s of the composite set, that is, that cj. would be 
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less than <r„. 
by y- 


It is clear that (14) and (14a) hold when x is replaced 




y 


?+ <Ty 

y + <tyi 

y-0n 




A grasp of these notions will help in the analysis of Table 27 which 
the student is asked to make in problems 5 and 6 below. 


Exercises 

1. (a) Show that vi = — £ fe — so) ~ (x — x 0 ). 

(6) Derive equations (9) and (13). If ni = n 2 , what does (13) reduce to? 

2. Given the following information about two sets of data: 


I II 

7i\ — 20 n 2 = 30 

xi =25 £2 = 20 

ci 2 = 5 o' 2 “ — 4. 


Find the mean and variance of the composite set. 

3. Think of the two groups in Exercise 2, page 97, as combined into a single 
set. 


(a) Find the mean of the combined set by formula (18). 

(b) Find the standard deviation of the combined set using result of (a) and 
formula (13). Ans. x = 68.72, a = 12.45. 

4. Using Theorem VI find the mean and standard deviation of the first 25 

natural numbers. 

5. Consider Table 27. Observe that the first and last columns form a frequency 

distribution and that columns (1) to (8) are subdistributions whose totals 
add up to N = 260 which is also the sum of the last column. Let n< 
represent the frequency in the fth column and answer the following 


questions: ni — ?, n 4 


?, n 8 = ?, = ? Let and on 2 represent 

1 


mean and variance in the fth column. Find the mean and variance of 
each of the columns (1) to (8), first in v units where v — (y — 85)/ 10. 
Check your answers with those given at the bottom of the table. 


Table 27 
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1> 

rH 

46 

3 

54 

ss 

35 

14 

AT - 260 

rH 

CO 

1> 

00 

1! 

ISfc 

rH 

rH 

8 

CO 

1! ' 

b 

§ 



i> 

<M 

rH 


tH* 


14 

105.0 

71.43 

£ 

<M 


00 

00 

1> 

iO 

rH 


35 

95.57 

222.53 

o 

. CO 


rH 

rH 

05 

<M 

rH 

00 

CO 

rH 

54 

90.92 

294.51 

S 

CM 

rH 

00 

12 

i> 

16 

00 

rH 

55 

85.73 

257.65 



CO 

i> 

10 

rH 

rH 

CO 

00 

hH 

40 

84.80 

283.63 

CO 


rH 

10 

rH 

12 

IQ 


CO 

32 

81.87 

246.48 

c* 

' 

i 



<M 

co 

tH 

*0 

CO 

14 

72.14 

191.83 






rH 

CM 

CM 

CM 


67.86 

106.12 

S> 


CO 

<M 

rH 

o 

rH 

1 

CM 

i 

CO 

! 

ss 

iS> 

b 


125 

115 

105 

J2 

05 

85 i 

, 

75 

i 

65 

lO 

no 
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6. Using formulas (18a) and (14) find the mean y and variance c/ of the total 
distribution in Table 27 and check your answers with those given at the 
bottom of the last column. 


Hint. The student will observe that the means, yi, of the columns in 
Table 27 are the values denoted by y in Exercise 3, page 97. The weighted 
mean of these mean values is the mean of the whole table. That is, 
from (18a), 


1 k 

V=x'£n i y i 


- 87.31. 


The answer 56.66 (Exercise 3) is the variance, a^. 2 , of the means of the col- 
umns of Table 27 and is hot to be confused with the variance a y 2 of the whole 

table. In using (14), a 2 is the variance of the whole table, cr* 2 is the vari- 

k 

ance of the z'th column, and the expression Yl n idi 2 equals Nay? where 

l 

erf/*- 2 is the variance of the means of the columns since now di = y% — y. 

7. In Theorem V (§9) show that 

V2^ = Ct 2 4- di 2 . 

Hence prove that (19) may be derived from (14) by showing that (14) 
may be written as follows: 

k 

Ne* = £«i(<ri 2 + df). 

1 

8, (a) Derive the following relation from (18a), 

x-L = — \ Nx — * 

ni L i = 2j J 

What does this formula become when k = 2? 

(6) Derive the following relation from (15), 

CTi 2 = — £iV(tr 2 4* ^ 2 ) — ^2 (cT 2 2 -f 


9. In a certain distribution of N = 25 measurements it was found that x — 56 
inches and <r = 2 inches. After these results were computed it was dis- 
covered that a mistake had been made in one of the measurements which 
was recorded as 64 inches. Find the mean and standard deviation if the 
incorrect variate, 64, is omitted. 

Hint. Let n x — 24, n 2 = 1. Then x 2 - 64 and a 2 = 0. To find xi and 
<n use formulas in Exercise 8 above. 

10. If two or more variates are deleted from a distribution for which N, x, and a 
are given, show how to compute the mean and variance of the remaining 
variates. 
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11. Consider a composite set consisting of k sub-sets and let <r t - 2 and m denote, 

respectively, the variance and number of variates in the zth sub-set, 
k 

and N = '^Lm. 
l 

(a) If the sub-sets have equal means, show that the variance of the com- 
posite set is given by 

1 k 


(6) If the sub-sets each contain the same number of variates and have equal 
means, show that 

1 k 


* 






CHAPTER VI 


TYPES OF DISTRIBUTIONS. THE NORMAL CURVE 

1. Skewness and Kurtosis. The shapes of frequency distributions 
are not all alike. Unimodal distributions may differ in two ways 
with respect to form. These differences can be described more easily 
if we think in terms of frequency curves. The curve may be quite 




symmetrical, or it may be skew, bulging out on one side more than 
on the other. Secondly, the top of the curve may be narrow and 
peaked, or it may be somewhat flat giving a mound-shape effect. 

The mean and standard deviation are not sufficient to detect these 
characteristics, so we need other measures to describe them. Con- 
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Table 28 



A 

B 

C 

u 

/ 

/ 

f 

—3 

0 

1 

0 

-2 

3 

1 

1 

-1 

6 

5 

10 

0 

7 

11 

6 

1 

6 

5 

5 

2 

3 

1 

2 

3 

0 

1 

1 

Sums 

25 

25 

25 


sider, for example, the three distributions of the weights (in class 
units) 1 of different breeds of mice 120-130 days old given in Table 28. 
Experiments on mice are important in cancer research. These dis- 
tributions are, however, some- 
what fictitious, being adapted 
from some actual data for pur- ^ 
poses of illustration. 

The student may easily verify 
that for each of these distribu- 
tions we find the same mean and 
standard deviation, namely, E 


-2 -1 


-3 —2 -i 


u = 0, <r u = 1.2. 

One may see from their his- 
tograms that these distributions 
are essentially different in shape 
even though they all have the 
same mean and standard devia- 
tion. These differences would 
be more pronounced if N were 
so large that the shapes ap- 
proached a regular and smooth 
form. Such a large value is 
called the “ population ” or “ universe ” and the value of N that 
we usually have at hand is a “ sample.” 

1 Neither the original units nor the class interval need concern us here. 


-2 -1 


0 1 
Fig. 20 
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Lack of symmetry in a distribution is known as “ skewness.” 
This characteristic is measured by a 3 . If a distribution is symmetrical 
az = 0, but az may be positive or negative depending upon whether 
the long tail of the distribution extends to the right or the left of the 
mean. (See Figure 18.) 

Figure 19 exhibits curves with different degrees of flatness or 
peakedness. The flatness that we are now describing is in the 
neighborhood of the mode and is not to be confused with the flat- 
ness of a curve as a whole which is due to spread or dispersion. 
The curves in Figure 19 all have the same spread. So their flatness 
depends upon the relative amount of material in the vicinity of the 
mode. This characteristic of a curve is called “ kurtosis ” and is 
measured by a 4 . By the calculus it can be demonstrated that a 4 = 3 
for a certain type of distribution which is called the normal curve. 
A frequency curve is said to have positive kurtosis if a 4 > 3 and 
negative kurtosis if a 4 < 3. It seems, however, that any combina- 
tion of kurtosis and peakedness may occur. 1 The values of <23 and 
a 4 computed for an observed distribution are useful in selecting the 
curve which will best represent the type to which that distribution 
belongs. 

Both az and a 4 are abstract numbers and therefore skewness and 
kurtosis in different distributions may be compared by these meas- 
ures. Therefore our definitions are 

s f a 3 is a measure of skewness, 

VV { a 4 is a measure of kurtosis. 

For an unsymmetrical distribution the distance between the mean 
and mode may be used to measure the degree of asymmetry or skew- 
ness, because the mean and mode coincide in a symmetrical distribu- 
tion. Since we wish any measure of skewness to be a pure number, 
we would express this distance in units of the standard deviation, 
thus (mean — mode)/<r. Now it happens that there is a certain 
curve known as Pearson’s Type III which is used to represent certain 

1 A Common Error Concerning Kurtosis — I. Kaplansky. J. Amer. Stat. 
Assoc ., vol. 40, p. 259, June 1945. In this connection, Professor I. W. Burr 
comments: “ The shape of the hump of the curve has less influence on 04 than 
does the length of the tails. In Figure 19, the curve with a 4 = 4.5 should have 
the longest tails.” 
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skew distributions, and it can be shown by higher mathematics that, 
for this curve, 

mean — mode as 

(2) — 7 ~T 

So this relation 1 may be used as a formula for obtaining the approxi- 
mate mode. 

Exercise 

Find ol% and cti for each of the distributions A , B, and C, in Table 28. 

2. Frequency Curves. As the student extends his experience he 
finds several types of distributions. It is important in certain prob- 
lems to differentiate between them. Differences in type lead to the 
study of frequency curves. There are several standard curves to 
represent the different types of distributions that arise in practical 
statistics. 2 Each of these is specified by a mathematical function 
y = fix) where f(x) is a general symbol for any function of x. It is, 
of course, a different expression for each of the different curves. 



Such functions are also called distribution functions. A complete 
discussion of this subject belongs to the field of advanced statistics. 
However, there are some simple concepts relating to frequency 
curves which will be useful in our work. 

If a frequency curve is used to represent a given distribution, the 
total area under the curve corresponds to the total frequency N, 

1 Because of this relation some writers use as/2 as a measure of skewness 
instead of as. Also some authors adopt a different convention as to sign, defining 
skewness as negative when the mean is greater than the mode. 

* See Chapter III, Part II. 
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and therefore the partial area under the curve between the ordinates 
erected at x = a and x ~ b (Figure 21) represents the number of 
variates with measurement or character between a and b. The limits 
between which the theoretical distribution ranges are denoted by 
h and h. It is often convenient and causes no loss of generality to 
suppose that the total area under the curve is unity or 100%, in 
which case the partial area between a and b represents the percentage 
of variates having the given character. 

In mathematical language the “ area under f(x) between a and b ” 
is called the “ integral of f(x) from a to 6,” and is denoted by the 
symbol 



However, we will abbreviate this symbol and use merely J* to de- 
note such an area. 

Without attempting to be rigorous, we may say that the total area 
under the curve is the limit of the area of the appropriate histogram 
whose rectangles have bases A# and altitudes f(x), as Ax is taken 
smaller and smaller and approaches zero. Thus 

f f(x) dx = lim J^f(x) Ax. 

«/ Ax—*Q 

The integral sign J is a conventionalized S and denotes the sum 
of elements of area with bases dx and altitudes y = f(x). The letters 
written’ at the top and bottom of J denote the range over which 

J pb pb 

f ydxor I f(x)dx 

a, tJa 

represents the area which is bounded by the curve y — f(x), the 
ordinates at x = a and x — b, and the a>axis. (Figure 21.) 

The integral of y = f(x) from h to Z 2 denotes the total frequency N. 
Therefore, 

ph 

N = / • 

Jh 


Hence, the proportion of variates having some character x } such that 
a < £ < b, is given by — / . If N is taken as unity or 100%, then 
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denotes the percentage of variates having the given character. 


The integral represented by this symbol also denotes the probability 
that a variate chosen at random from the universe y = f(z) will 


have a value between a and b. 

3. The Normal Curve. Perhaps the most important of all fre- 
quency curves is the so-called normal 1 curve whose equation may be 
written 


(3) 


y = Ke 


9 


where K , h 2 , and m represent numbers whose significance will be 
explained presently. The curve is bell-shaped and is symmetrical 
about the line x = m. It was first discovered by a famous French 
mathematician, De Moivre, over two hundred years ago and pub- 
lished in 1733. He obtained it while working on certain problems 
in games of chance which were proposed to him by the gamblers of 
his day. Because of this origin and because the data from certain 
coin- and dice-throwing experiments closely approach it in form, it 
is often called the normal probability curve. Actual statistical use 
of the normal curve began with the work of the famous mathematical 
astronomers, Laplace (1749-1827) and Gauss (1777-1855), each of 
whom derived it independently and presumably without knowing of 
De Moivre’s treatment. 2 They found that it represented very well 
the errors of observation in the physical sciences. For this reason 
it has been called the normal curve of error, where error is used in 
the sense of a deviation from the true value. Since that time experi- 
ence has shown that it serves quite well to describe many of the dis- 
tributions which arise in the fields of biology, education, and sociology. 
Much of the theory of statistics is built around it. 

The calculus is required to define the moments of a theoretical 
distribution specified by a frequency curve y = f(x). (These defi- 
nitions are given in Part II.) It turns out that the mean of the dis- 
tribution specified by (3) is m and its variance is 1 / (2h 2 ) . The 
constant K is determined so that the area under the curve shall have 
some relevant value. In describing an observed distribution by 


1 The term “ normal ” used here should not be interpreted to mean that other 
types of distribution are abnormal. 

2 For a more extensive history see (a) “Bi-centenary of the Normal Curve,” 
Jour. Amer . Statistical Assoc., vol. 29 (1934), pp. 72-75. (6) “Mathematical 

Statistics” (Carus Monograph) — Rietz, Ch. 3. 
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means of a normal curve, we wish to have the number of area units 
under the curve (3) equal to the number N of observed variates. 
When this condition is imposed, K = Nh/V t and we see that K 
depends also on h. If we adopt the same notation 1 here as we used 
for an observed distribution, we have 


m = x, h 2 = 



K = 


N 

a x V&r 


Upon making these replacements, (3) becomes 


(3a) 


y = 


N 

<T x\^2ir 


e -(X-X)*/2cr X * m 


4. Standard Form. The letters t and e represent numbers which 
always have the same values (see §1, Chapter I). But each of the 
letters m, h, and K may take on different values in different situa- 
tions. Such constants are called parameters, and (3) really rep- 
resents a family of curves. Similarly, in (3a), x, c r, and N are 
parameters. For assigned values they determine, respectively, the 
position of the curve along the x-axis, its steepness, and its “ size” 
but they do not have anything to do with its fundamental charac- 
^ teristics (i.e., those properties which differentiate it from all other 
curves). In order to study these characteristic properties it is 
convenient to represent the curve by an equation which will be in- 
dependent of the parameters; in other words, to eliminate them from 
the equation by a transformation. This is accomplished by con- 
sidering the total area under the curve as unity, taking the origin at 
the mean, and using the standard deviation as the unit of horizontal 
measurement. In mathematical language this means that we set 
N = 1, and t = (x — x)/cr x . We will denote the resulting function 
by that is, 


which is called the standard form of the normal curve. 

A variable , t, which is distributed in accord with (4) & said to be 
normally distributed with mean zero and unit standard deviation. 

Just as coordinates of points on the curve are denoted by ( x , y) 

1 In the theory of sampling, Part II, it is necessary to distinguish the moments 
of a sample from those of the parent universe by the use of different symbols. 
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in the case of equation (3a), so in equation (4) t refers to abscissas and 
4>{t) refers to ordinates. The relation between the two systems of 
coordinates is given by 

(5) x = t<j + x 
for abscissas, and 

( 6 ) y = - 4 >(<) 

O' 

for ordinates. Equation (6) follows from (3a) and (4). If the area 

under the curve is taken as unity, then y ~ - that is, <j>(t) = ay. 

a 

This says that since the abscissas are compressed by a in changing 
from arbitrary units into standard units, so the ordinates must be 
stretched by a if the area under the curve is to be the same in the two 
scales of measurement. 

6. Tables of Standard Ordinates and Areas. One of the reasons 
for writing the equation in standard form is that the ordinates and 





Fig. 22 


areas may be tabulated once and for all. These tables are given in 
the Appendix. We see from (4) that = 0(+O> i.e ., the ordi- 

nates for negative values of t are the same as for the corresponding 
positive values of t, and the curve is symmetrical about the ordinate 
at t = 0. Therefore it is necessary to tabulate values of <p(t) for 
positive t ’ s only. Equation (4) may be graphed by plotting the 
points corresponding to a few well chosen values from the tables 
and drawing a smooth curve through them. (Figure 22.) 

The curve approaches very close to the horizontal axis at each 
extremity but is asymptotic, that is, it does not quite touch the axis 



Sec. 5 Tables of Standard Ordinates and Areas 117 


no matter how far extended. We say its limits are at — °o and 
+ oo . Although the infinite abscissal range is never met in practice 
it may be characteristic of the “ universe ” from which a given 
distribution is a sample. Therefore, this infinite feature is useful in 
theoretical investigations. Moreover, even in representing observed 
distributions the infinite range causes no practical difficulty because 
the curve comes down to the horizontal axis very rapidly beyond 
t — ±3. The combined area at each extremity beyond t = ±3 is 
only .27 of 1% of the total area under the curve. 

Partial areas between ordinates erected at various values of t, say 

between t = a and t = b, are denoted by I . Thus the area from 

Ja 




t = 0 to t = 1 is given by f = .3413. (See Table I, Appendix.) 

Since the total area under 0(0 is taken as unity the area on either 

side of t = 0 is 0.5 and it is only necessary to tabulate the areas I 

Jo 

for positive values of t. Thus the area from t = — 1 to t — 0 is equal 
to the area from t = 0 to t = 1. In symbols this would be stated 
as follows: 

/>/:■ 


Any other areas required may be found by an appropriate addition 
or subtraction of tabular values. For example, suppose the area 

below t — —2 is required. This is denoted 


by Z, 


Now the area 

oo 

from — co to —2 equals 0.5 minus the area from —2 to 0. And the 
area from —2 to 0 is the same as from 0 to 2. That is, 


f 2 = .5 - f° . But f°=f 2 = , 
t/-co J - 2 J - 2 Jo 


4772. 


x 


.5 - .4772 = .0228. 
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Both areas and ordinates for decimal values of t between tenths may 
' be approximated by interpolating between the values given in the 
tables. 

The illustrative examples following §6 will help the student 
become familiar with the tables. He should verify the answers and 
draw a simple sketch of the curve showing the ordinates or areas in 
each case. 

The symbol I denotes a cumulative relative frequency, i.e. 

*/ — a 

the percentage of the total frequency N which is less than t. In order 
to find values of J from the tables, for assigned values of t , the 
student should observe (from a figure) that 

X» = - 5± X 

the plus or minus sign to be used according as t is positive or negative. 

6. Properties. A knowledge of the properties of the normal curve 
is essential for an intelligent use of the curve in practical statistics. 
A demonstration of some of these properties is beyond the scope of 
the present discussion although quite simple in the calculus. The 
following properties are the most important and interesting. 

1. The mean, median, and mode coincide at t — 0. The height 
of the maximum ordinate in standard form is 1 /V 27 T because when 

t = o, <!>(t) = 1/V2 X = . 3989 . 

2. Since the standard deviation is the unit of measurement along 
the horizontal axis, <r x = 1 in the t scale. Any t value may be con- 
verted into the corresponding x value by (5). In the vertical direc- 
tion N/a is the unit of measurement and any 4>(t) ordinate may be 
converted into y units by means of (6). 

The area under (3) in the range from x = c to x = d is denoted by 


jf ydX ' 


li t ~ a and t = b denote the corresponding range in standard units, 
then 


j\{t) 


denotes the corresponding area, in standard units, under (4). It is 
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shown in the calculus that dx = c x dt. Therefore from (6) we have 


( 8 ) 


jf ydx - N J dt. 


If the interval goes from x = c to x = d, (8) says that 

(9) Frequency over (c, d) = N f b 

t/a 

where 

(10) a = (c — x)/c T x 9 b = (d ~ x)/tj x . 

This merely means that the percentages (relative frequencies) ob- 
tained from the tables may be converted into numbers (frequencies) 
by multiplying the percentages by N. 

3. The curve changes from concave to convex at t = ±1. In the 
z-scale, referred to the origin of x, these points are at x = x zb a x . 
They are called points of inflection and their position is important 
in making an accurate drawing of the curve. 

4. The standard deviation is approximately 25% greater than 

the mean deviation. More precisely, MD = a ^ = .798 <x. = 


1.2533 


) 


5. The quartiles, Qi and Q 3 , are equidistant from t = 0 and there- 
fore from the mean. By definition 

Qz is that value of t for which I = .75, 

V — CO 

i.e., for which J* — .25. From the tables thisjs t = .6745. There- 
fore in arbitrary units, 

Qz — x -{■ .6745 cr x and Qi = x — .6745CT2;. 

6. The quartile deviation (semi-interquartile range) for a normal 
distribution will be denoted by E. Its value is 


E = 


(x + .6745cr) - (x- .6745cr) 


= .6745or. 


In standard units this is s = E/a — .6745. 

7. The quantity E (or s) has a significance in probability theory. 
If a variable x is distributed according to the normal curve, the 
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probability is one half that a variate selected at random will have a 
value between x — E and x + E. The reason for this statement is 
that 50% of the variates have values within this range. E is com- 
monly, though somewhat ambiguously, called “ probable error.” 

8. E is in units of x whereas $ is a value of t, that is, s is the value 
t = .6745, and E is the value x = . 67450V Just as a x may be used 
as a yardstick in scaling off a distribution on either side of the mean 
(§6, Chapter V), so may E or s be used in a similar manner. When 
thinking of them in this way it is useful to regard E as a yardstick 
about two-thirds the length of cr x . The following table gives the 
end-points of certain intervals in t, x', and x units, respectively, where 
t — x* I < j x and x l = x — x. 


End Points of Certain Intervals in t, x', x 


When er is the unit 

When E is the unit 

t 

x' 

X 

t 

x' 

x 

0 

0 

X 

0 

0 

X 

±1 

±<T 

x±<r 

± .6745 

± .6745a- 

35 ± .6745a- 

±2 

±2<r 

X ± 2a 

±1.349 

±1.349<r 

x ± 1.349a- 

±3 

± 3<r 

"X db 3a- 

±2.023 

±2.023cr 

j x ± 2.023a* 


The percentage distribution of area under the normal curve is 
given (approximately) in Figure 23 where a x is the unit of measure- 
ment along the horizontal axes and in Figure 24 where s is the unit. 
The percentages given in the figures may be regarded as abridged 
tables. Of course the tables in the Appendix will ordinarily be used 
in problems. 

With reference to Figure 23, it is sometimes said that if values of x 
are normally distributed, the probability that a value chosen at 
random will fall within the range Xi < x < x 2 , where Xi = x ~ a x 
and x 2 = x + <r x , is .68. 

9. Astronomers and physicists have called h the “ modulus of 
precision.” From the relation h = 1/ (V / 2<r), it is evident that h 
increases as a decreases. And as h increases, the curve (with N and 
m kept constant) becomes narrower in the neighborhood of m and 
in' this sense h measures the closeness of the values of x to their mean. 
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Fig. 23 Fig. 24 

10. The curve is symmetrical and a s = 0. The fourth moment 
about the mean is equal to three times the square of the second 
moment about the mean, i.e., ju 4 = 3/i2 2 and therefore a 4 = / 4 //X 2 2 = 3. 


Examples 

1. Find the ordinates of $(t) for (a) t — 2.3, (&) t — —2.3, (c) t = .67. 

Solutions from the tables in the Appendix: 

(a) 0(2.3) = .02833 

Q b ) 0(— 2.3) = .02833 
(c) 0(.67) = .31874 

2. Find the following areas under 0(0 and use the integral notation: 

(а) From^ — 0 to t — 3.00 

(б) From t — 1 . 5 to t — 2.5 

(c) From £ — —2 to £ = 1.3 

(d) From t — 0 to £ = 0.6745 


Solutioris from the tables: 

f 3 

(a) The required area is given by whi 


which we find to be .49865. 

1.5 


(6) The area from t 
^ 2.5 


1 : 


= 2.5 is jf 

-X 


0 to $ = 1.5 is 

Jo 

J p2.5 p 

= I 
1.5 Jo 


.43319, and from t — 0 to 

*2.5 /»2.5 


= .0606. 


(c) Since the area from t = 0 to t — —2 is the same as from t — 0 to t — 
+2 we have 

*1.3 


r . r + r 

J-2 JO JO 


« .47725 + .40320 = .88045. 
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(d) Here we must interpolate: 


For 

t = .67, 

jH = .24857 

For 

t - .6745, 

jf = 4 (say) 

For 

t - .68, 

jT = .25175. 

Therefore 

A - 

.24857 .0045 


.25175 

- .24857 ~ .01 

whence 


.4 = .25. 


3. Show that for equation (3), the percentages of area outside the given ranges 
are as stated below: 

Above x + a — 15.87% 

Outside x ± <r = 31.74% 

Outside x ± 2cr — 4.56% 

Outside x -±3cr — 0.27% 

Solution: Converting these ranges into t units, and remembering that only 
the positive half of the area under $ (£) is tabulated and equals .5, we have 


Area 

above t = 

1 

is .5 — r 

= .1587 




Jo 

= 15.87% 

Area 

outside £ = 

±1 

is 2(15.87%) 

= 31.74% 

Area 

outside £ = 

±2 


| = .0456 
= 4.56% 

Area 

outside £ = 

±3 


| = .0027 
= 0.27% 


4. Given N — 1500, x — 75, cr x = 10. If the variates are distributed according 
to the normal curve, (a) find the value of x for which cum f = 800, (6) for 
which cum f = 450, (c) how many of the N variates lie where x < 80? 
Solutions : 

C x 

(a) By definition, cum f - I 


and from (8), f = N f 

e/ — oo t/ — co 

.*.800 = 1500 J l 

i.e.j f = .5333. 

t/ — CO 
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£.~ s+ £ 
-jT=« 


.0333 


whence from the tables, 

t = .083. 

Substituting in equation (5), 

x = 75.83. 

r 

i I — 45/150 = .3 and t is negative. 

t J — CO 

£.-*-£ 


(b) We have 
Since 


we have 

whence we find that 


£ 


t = -.524 
so 

x = 69.76. 

(c) From the relation t = (x — x)/<r x we find that 
t = .5 when x = 80. 

From the tables, 

/ = .69146. 

J — co 

From (8) we have 

/ = 1500 (.69146) 

J — eo 

= 1037.2. 

Exercises 

1. Find <#>(2.65), *(-1.46), *(0). 

2. Find t if — .1257, .0325, .0034, respectively. 

3. Find the following areas under and draw a figure in each case: 


(a) f > r, r, f > f 

JO J- 1.2 J- oo J 1.2 J- 1.2 

(5 } r s7 > r 

J— .37 J-.t 


"*.37 /\6745 

-.37 t/-.6745 
4, Find given the partial areas: 


2 T= .5, f - .27457, T = 
Jo Jo J-f 


,999730. 


123 


124 


The Normal Curve 


VI 


5. Verify the percentages given in Figures 23 and 24. ^ 

6. (a) How far from the median of a normal distribution is the first quartile? 

(6) In a certain normal distribution x - 89 and Qi - 75.51. What is a x ? 

7. For a normal distribution: N — 1000, x = 20, cr z — 2. 

(а) What is E? 

(б) Find the value of Qz. 

(c) What values of x will include the middle 500? 

(d) The middle 75%? 

8. If N - 300, x ~ 75, cr x = 15, for a normal distribution: 

(а) What is the value of the first quartile? 

(б) The third quartile? 

(c) How many variates are between x — 60 and x = 90? 

9* In a college the 8 grades A, A — ; B, B — ; C, C — ; D, and F are given. 
On the assumption that mathematical ability is normally distributed, 
how many out of a total of 1000 should receive each grade? Assume 
that x is the boundary between the C and B — grades and that each grade 
interval is .8a. What range in standard units on either side of x is thereby 
assumed to include all the grades? 

10. What are the percentages of a normal distribution outside x dh ta for 
t - 1, 2, 3? 


7. Curve Fitting. It should be remembered that a set of data 
collected and presented in the form of a frequency distribution is 
merely a sample of a general type called its universe. Other samples 
from that universe might yield somewhat different frequency distri- 
butions. 

For certain purposes it may be desirable to fit a normal curve to 
a unimodal distribution which is reasonably symmetrical and appears 
to be of the normal type. The theoretical curve idealizes the recal- 
citrant observational data and smooths out the irregularities due to 
sampling fluctuations. 

In fitting equation (3a) to a given distribution, we assume that 

(1) The given frequency N represented by a histogram equals the area 
under the curve , and 

(2) The mean and standard deviation of the observed distribution 
equal , respectively , the mean and standard deviation of the theoretical 
distribution represented by the curve. 

A normal curve is a mathematical model of a hypothetical uni- 
verse. In identifying such a universe with (3a) only its form is 
specified by the model. The parameters are (usually) unknown. 
An estimate of a parameter by the use of an appropriate function of 
the observed data is called a statistic . Assumption (2) above means, 
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then, that we replace each of the parameters by the corresponding 
statistic. 1 

The procedure of fitting a normal curve to an observed distribu- 
tion will now be illustrated with the data of Table 21, p. 88. We 
substitute 

x = 47.712 
er* = 5.772 
N = 1000 

in equation (3), and obtain 

1000 (*- 47 . 712 )= 

V = M7 2VE 6 ~ 2(5 ‘ 772)! * 


To make use of a table of standard ordinates in graphing this 
equation we transform it into standard units by setting 

x 47 712 

(а) t = — — = .17325a; - 8.2661 
and write 

(б) y = -0(f) = 173.250(f). 

<r 


Appropriate values to assign x in equation (a) are the end-x and 
mid-x values of the given distribution. The use of a computing 
machine in changing x values into corresponding t values is explained 
in §6, Chapter IV. Thus we obtain the values in the second col- 
umn of Table 29. We may then enter the table in the Appendix 
for the corresponding ordinates, These are converted into y 

values by equation (6). The curve may then be drawn by plotting 


} It is shown in Part II that a better estimate of the variance in the universe is 
obtained by multiplying the variance of the observed distribution by N/(N — 1). 
Because of this fact some writers, denoting this result by s 2 , define the variance of 
an observed distribution by 


s 2 



k 


'E.fifa - x)\ 


The distinction between the two definitions is not an important one, in the au- 
thor’s opinion, for beginning students who are learning the descriptive method- 
ology of statistics. And in curve fitting, the numerical difference is negligible 
because N is fairly large. The distinction is important, however, in the theory 
of small samples (Part II). 
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Table 29. t = .17325 x - 8.2661, y = 173.25*( i ) 


X 

t 

Ht) 

y 

fh 

27.5 

-3.502 

.00086 

0.15 


29.5 

-3.155 

.00275 

0.48 

0.25 

31.5 

-2.809 

.00772 

1.34 


33.5 

-2.462 

.01927 

3.34 

3.50 

35.5 

-2.116 

.04253 

7.37 


37.5 

-1.769 

.08344 

14.46 

14.00 

39.5 

-1.423 

.14494 

25.11 


41.5 

-1.076 

.22361 

38.74 

43.00 

43.5 

-0.730 

.30563 

52.95 


45.5 

-0.383 

.37072 

64.23 

61.25 

47.5 

-0.037 

.39866 

69.07 


49.5 

0.310 

.38023 

65.87 

65.75 

51.5 

0.656 

.32230 

55.84 


53.5 

1.003 

.24124 

41.79 

39.00 

55.5 

1.349 

.16060 

27.82 


57.5 

1.696 

.09469 

16.41 

16.75 

59.5 

| 2.042 

.04960 

8.59 


61.5 


.02299 

3.98 

5.75 

63.5 

2.735 

.00948 

1.64 


65.5 

3.082 

.00346 

0.60 

0.75 

67.5 

3.428 

.00111 

i 

0.19 
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Fig. 25 — Normal Curve Fitted to Histogram Representing Weight 
Distribution of Glasgow Schoolgirls (Table 21) 

The smooth curve is plotted from the points (x f y) given in Table 29. The 
column headed f/c in that table gives the heights of the rectangles in the histo- 
gram, c = 4. When both the curve and the histogTam are to be drawn, it is best 
to draw the curve first so that the presence of the histogram will not prejudice 
one into trying to make the curve fit the histogram. 
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the x and y values. (Figure 25.) The curve should be drawn so 
as to be symmetrical with respect to the ordinate at the mean and 
its points of inflection should be at a distance from the mean equal 
to cr. The student should observe that every 
pair of (x, y) values computed in Table 29 
furnishes two points for the graph, each sym- 
metrical to the other with respect to the mean 
ordinate. Both points should be used in 
drawing the curve but only the computed 
points should be left permanently in the graph. 

After the curve is drawn, the histogram for the observed data may 
be constructed. The column headed f/c gives the heights of the 
rectangles on the same scale as the ordinates of the curve. 

8. Graduation. The areas under the fitted curve and over the 
class intervals are called theoretical frequencies. Thus in Figure 25 
the shaded area represents the theoretical frequency corresponding 
to the observed frequency which is represented by the rectangle the 
mid-point of whose base is 41.5 pounds. The determination of the 
theoretical frequencies is called “ graduation by the normal curve.” 
It is a process of smoothing out the data to fit the curve. The method 
is shown in Table 30 for the data represented by Figure 25. 

In order to enter a table of standard areas we must change the 
end-x values into t values. These are given in the third column of 
Table 30. They are part of the values already computed for Table 29. 

The entries in the column headed A = I are the ( cum f)/N 

%J — CO 

values of the standard curve for the given end-points. The entries in 
the column headed A A are obtained by differencing the preceding 
column. (See last paragraph of §9, Chapter I.) They are the per- 
centages p = f/N — AA to be expected in the various intervals on 
the hypothesis of a normal distribution. Therefore NAA gives the 
numbers to be expected, that is, the theoretical frequencies. 

The student should study this table until he becomes familiar with 
all the operations involved and what they mean. He should distin- 
guish between the purposes of Tables 29 and 30. 

9. Purpose of a Graduation. If, for the distribution of graduated 
frequencies, the mean, standard deviation, and total frequency are 
found, their values will be precisely those of the corresponding mo- 
ments in the observed frequency distribution. This must be so, 
because these were the conditions imposed in the process of gradu- 


fr 


fss area 
c = base 
f/c ss height 




Table 30 


Observed 

Frequency 

Boundary 

x 

t 

s/— oo 

A A 

naa = 

Theoretical 

Frequency 


— 00 

— 00 

.0000 



1 




.0025 

2.5 


31.5 

-2.809 

.0025 



14 




.0147 

14.7 


35.5 

-2.116 

.0172 



56 




.0602 

60.2 


39.5 

-1.423 

.0774 



172 




.1553 

155.3 


43.5 

-0.730 

.2327 



245 




.2527 

252.7 


47.5 

-0.037 

.4854 



263 




.2587 

258.7 


51.5 

0.656 

.7441 



156 




.1674 

167.4 


55.5 

1.350 

.9115 



67 




.0679 

67.9 


59.5 

2.042 

.9794 



23 




. 0175 

17.5 


63.5 

2.735 

.9969 



3 




.0031 

3.1 


00 

00 

1.000 



Totals 




1.0000 

1000.0 
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ation. Moreover, the observed values of skewness and kurtosis as 
given by a 3 and on will not differ appreciably from the theoretical 
values if the fitting of the normal curve to the observed distribution 
was justified. 

Since the above parameters characterize a distribution, the ob- 
serving student may wonder why a distribution should be graduated 
if the values of these constants are unaltered in the process. 

There are three main reasons why a student should be taught to graduate a 
curve. The first, and least important, has to do with the use of a smooth curve 
in place of a jagged sample. The second, and most important, is that it is 
necessary for the mathematical development of statistics that the mathematician 
should be told what assumptions he may make. These usually depend on the 
types of frequency curves which can be depended on to fit phenomena. . . . 
A third reason, intermediate in importance between the other two, is that in 
testing a priori theories in various fields, it is often necessary to test the efficacy 
of the frequency distributions which are results of these theories. 1 

The second and third of the above reasons may seem somewhat 
abstruse, but it is 'not easy to give completely satisfactory explana- 
tions of them at this level of exposition. About all we can say at 
this time is that the distribution of variation of a variable x about its 
mean value is a fundamental statistical concept and in certain theo- 
retical investigations it is very important that we have mathemati- 
cal functions which are capable of representing such distributions. 
This is particularly true in sampling theory which will be discussed 
in Part II. 

The first reason is more readily understood. Occasionally in 
practical problems it may be desirable to use the theoretical fre- 
quencies obtained by graduation in place of the observed data which 
probably contain irregularities due in part to grouping, in part to 
sampling fluctuations. . We cite here two illustrations. 

Example 1. A company which operates a chain of men’s haberdashery stores 
planned to bring out a new line of about 100,000 light weight sport shirts suitable 
for camping, hunting, etc. The question arose as to the determination of the 
number of each size that should be ordered from the factory. Their previous 
distribution of sizes had not been satisfactory because the demand for certain 
sizes had been different from the number manufactured. Therefore the statistical 
department was requested to recommend the distribution of the proposed order 
according to neck sizes. The solution of the problem hinged upon the availa- 
bility of data giving the measurements of neck circumferences of a large sample 
of men. Satisfactory data were found in the “ Reports of the Medical Depart- 
ment of the United States Army in the World War,” which gave a table of the 

1 Journal of the American Statistical Assoc., vol. XXVI, March 1931, Supple- 
ment, p. 36. 
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neck measurements in centimeters of 95,102 white troops at demobilization. 
Since these data are tabulated in class intervals which are slightly different from 
the ranges used in standard shirt-band sizes, a slight adjustment was necessary. 
But essentially a normal curve was fitted to this distribution and the graduated 
frequencies were taken as the number of potential customers for each shirt size. 
The result was quite satisfactory. 

Example 2. A well known and interesting illustration of the desirability of 
smoothing occurs in the census returns. The census takers’ records show more 
persons alive at age 30 than at age 29, more at age 35 than at age 34, more at 40 
than at 39, etc. This is probably due to the fact that men (as well as women) 
do not tell their exact ages. A person who is actually 41 or 42 and known to be 
40 or so, says he is 40. The recorded data show artificial bumps at every age 
which is a multiple of 5. Naturally the Census Bureau prefers the smoothed 
results to be observed. The student should not infer that the curve used to 
smooth these data is the normal type. The “life curve” is a continuously de- 
creasing function. However, the same kind of quinquennial irregularity occurs 
in other actuarial data which do approximate the form of a normal curve. Many 
examples are given, in Elderton, Frequency Curves and Correlation. 


10. Probability. A frequency curve is sometimes called a proba- 
bility curve. The link connecting frequencies with probabilities 
has its starting point in the following definition: 

Definition. If out of N mutually exclusive and equally likely 
events, f are distinguished by some 'property A, the probability of an 
event bearing the property A is f/N. 

The definition implies that probability is measured by a number in 
the range 0 to 1, the lower limit denoting impossibility and the upper 
limit denoting certainty. 

Since the total area under the curve represented by (4) is unity, 
any partial area denoted by (7) can be interpreted as the prob- 
ability that a value of t selected at random from a normal distribu- 
tion (4) lies between t = a and t = b . 

Example 1. Refer to the data of Table 8, Chapter I. Let us assume that a 
normal curve was fitted to this distribution and that the fit seemed (by visual 
inspection) to be reasonably good. Generalizing on the experience shown in the 
table, the telephone company wishes to estimate the probability that a call (of 
the same type of message as that in the table) will be between (say) 500 seconds 
and 600 seconds in length. 

Solution. Using (10), 

a = (500 - 477.3) /148.5 = 0.15, 
b = (600 - 477.3) /148.5 = 0.83. 

J ’.83 

= 0.24. 

0,15 
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Example 2. Referring to Example 1 above, find the probability that the length 
of a telephone call will differ numerically from the mean of the table by as much 
as 5 minutes. 

Soluiim. We find 1 1 1 = 300/148.5 = 2.02. The probability of a deviation 

X 2.02 

— 0.96, approxi- 
mately. Then the probability of a numerical deviation as large as (or larger 
than) 300 seconds is Q = 1 — P - 0.04. This would be represented graphi- 
cally by the area under the curve outside t — ±2.02. 

11. Probability Paper. The cumulative frequencies for the normal 

curve are given by A = J* . As t varies from — co to + 00 , 

A varies from 0 to 1, and for the finite range t = ±3 (commonly met 
in practice) A varies from 0.00135 to 0.99865. (Verify.) Regarding 
A as a function of t, Values of (t, A ) from the tables may be plotted 
and the resulting points joined by a smooth curve. 



-3 - 2-10 12 3 


Fig. 26 — Ogive of the Normal Curve 


When graphed on an algebraic scale this curve is the ogive of the 
normal curve. It is also called the integral curve of As indi- 

cated in Figure 26, the ordinate of the ogive is zero at t — — 00 f 
,5 at t — 0, and the ogive approaches the line A = 1 asymptotically. 

Now imagine the vertical scale of Figure 26 stretched in such a 
way that the ogive becomes a straight line. The stretching required 
will be greatest around the line A — 0.5 and gradually diminish as 
the distance from this line increases. 

Paper so ruled that the ( t , A) graph is a straight line is called 
probability paper . It is readily obtainable 1 and is convenient for 
many purposes. Thus, by plotting cum f for an observed distribu- 
tion on probability paper, one may observe how closely it approxi- 

1 The Codex Book Company, New York. 



Sec. 11 


Probability Paper 


133 


. mates a straight line and hence get an idea of how nearly normal it 
is. One may thus locate graphically the median, quartiles, etc., and 
estimate frequencies between given limits. 

A more complete discussion giving references to writers who sug- 
gested and developed the use of probability paper may be found in 
the Journal of the American Statistical Association , vol. XXVI, June 
1931, p. 178. 

Exercises 

1. Construct three normal curves on the same axes according to the following 
specifications. Compute ordinates at intervals of .5a from the mean in 
the range x ± 3a. 


Curve 

a x 

X 

N 

A 

10 

50 

400 

B 

10 

50 

800 

C 

10 

50 

1200 


Suggested form for computations: 



t 

<#>(«) 

y 


A 

B 

C 

20 

-3 





80 

3 






2. Construct three normal curves on the same axes according to the following 
specifications. Compute ordinates at intervals of .5a from the mean. 


Curve 

a x 

X 

N 

A 

15 

50 

1000 

B 

10 

50 

1000 

C 

5 

50 

1000 


Suggestion: 
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Observe that: 

yc = 2000(0 

y B = \y c = 1000(0 
, 200 

Va = 3'2/c = — 0(0* 

3. Verify the entries in Tables 29 and 30. 

4, For the following distribution : 

(a) Find the equation of the best fitting normal curve, and plot the curve 
and histogram. 

(i b ) Find the graduated frequencies. 


mid-x 

2 

4 

6 

8 

10 

f 

1 

4 

6 

4 

1 


5. Graduate the distribution in Table 8, §11, Chapter I. Also find the ordi- 

nates of the best fitting normal curve and plot the curve and histogram. 

6. A distribution of the weekly wages of 906 anthracite miners showed the 

following results: 

x = $36.13 olz = 0.007 

cfx — $8.87 = 3.02 

Assuming a normal distribution, estimate the number of the 906 miners 
who received weekly wages (a) in excess of $45, (6) less than $25. 

7. An urban electric railway company operating a large city subway uses 

thousands of electric light bulbs in its underground stations. On January 
1, 1947, the company put into service 5000 new light bulbs. Let it be 
assumed that these 5000 bulbs will have a mean life of 50 days, a stand- 
ard deviation of 19 days, and that their lives conform to the normal 
curve. 

If January 1 is counted as a full day in the life of the bulbs: (a) How 
many bulbs out of the 5000 new ones would have had to be replaced by 
midnight January 31, 1947? (6) How many by March 10, 1947? 

8. Which properties of the normal curve may be used as 'criteria in passing 

judgment on the normality of an observed distribution? Would you say 
that the distributions referred to in Table 23 are approximately normal? 

9. Graph the ogive of the normal curve by plotting values of ( t , A) in the range 

t = dfc3, (a) on an algebraic scale, (5) on probability paper. 

10 . What famous mathematicians’ names are associated with the normal curve? 

When did these men live? Which of them should most appropriately be 
credited with the discovery of this curve? 

11 . (Camp) The standard deviation of a certain set of 100,000 high school 

grades was 11%, and the mean grade was 78%. Assume the distribution 
to have been normal, and, being careful not to confuse percentage in the 
sense of grade with a percentage of frequency, answer the following ques- 
tions: How many grades were (a) above 90%, (6) below 7Q%? (c) What 
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was the highest grade of the lowest 1000? ( d ) Within what limits did the 
middle 90,000 lie? (e) What was the semi-interquartile range? 

12. (Camp) Answer all the questions of Exercise 11 with reference to a set of 

100,000 grades in which the median was 83% and was 90%. Also 
find <r x- 

13. In a certain normal distribution, N ~ 1000, x = 50, <r s = 10. For this 

distribution: 

(a) Convert the following x’s into the corresponding tf’s, 






Q 





























(6) Find from the tables the values of <f>(t) for the t values in (a). 

'(c) Convert the <£(0 values obtained in (a) into y values. 

(d) Plot the (x, y) values in (a) and (c) and draw a smooth curve through 
them. 

(e) Find the cumulative relative frequencies, A = / , for the values of t 

— CO 

in (a). 

(/) Difference your results in (e) by finding A A. 

(i g ) Convert the percentages in (/) into frequencies. 

(h) Explain the meaning of your results in (g) with reference to the figure 
for (d). 

( i ) Find the number of variates between x = 42 and x — 74. 

(j) Find the values of x for which cum f — 250, 600, 750, respectively. 

14. Given a normal distribution in which N — 800, x = 40, <r x = 7. Find the 
numerical value of each of the following. 

nt —s 

Qif O 2 , Qs, E, N I 

Jt=o 

16. Suppose N — 5000 variates are normally distributed such that x = 50 and 
E = 13.49. Without using the tables find the value of the following: 
quartiles, median, mode, standard deviation, mean deviation, x for which 
cumf — 1250. 

16. Suppose there are N values of a variable v which are normally distributed 
with mean = 0 and variance — 25. 

(а) Give the equation of the curve which represents the distribution. 

(б) If there are 793 values between v ~ — 5 and v = 0, determine N, 

(c) What percent of N have values larger than v = 10? 

(d) Determine the value of v for which cum f = .75AT. 


CHAPTER VII 
CURVE FITTING 

1. Empirical Expressions. The preceding chapters have dealt 
with the description and characterization of frequency distributions. 
We have considered three general methods of description: (1) graphi- 
cal devices, (2) the method involving calculation of averages and 
measures of dispersion, (3) the method which is sometimes called 
analytical . This latter method consists in describing the distribution 
by an equation, and we considered only one such analytical expression, 

the normal curve. 

Example 1. Expectation of Life 1 at various However, another branch of 

statistics is concerned with 
data which may not be classed 
under frequency distributions, 
but which may be described 
by simple equations. 

When one variable is a func- 
tion of another in applied 
mathematics the mathematical 
relation between them is not 
always known. As we men- 
tioned in Chapter II, the only 
information regarding this 
functional relationship may be 
a set of pairs of values obtained 
by experimental or observa- 
tional means. These pairs of 
values may be regarded as 
coordinates of points and plot- 
ted. In doing so, the values 
of the variable which is regarded as independent are taken as 
abscissas, and those of the dependent variable as ordinates. 

The general problem in such cases is to find, if possible, an analytic 

1 By expectation of life at any age is meant the average number of years lived 
by persons attaining that age, as given in the American Experience Mortality 
Table . 


#o 

30 

20 


IO 


20 30 40 50 

6o 70 60 90 

Age 

Expectation 

20 

42.20 

30 

35.33 

40 

26.18 

* 50 

20.91 

6o 

14.10 

70 

6.48 

eo 

4.39 

90 

1.42 
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expression of the form y — f{x) for the functional relationship sug- 
gested by the data. Equations obtained to fit observed data as well 
as possible are called empir- Examph2 Ycarly Production of Cigarettes 
ical to distinguish them in the United States 

from the rational expressions 

of pure mathematics which IQQ 

can be derived from reason- 
ing. This general problem 
is called curve fitting. It is 
also sometimes referred to 
as “ smoothing ” the given 
data. 

We will consider three 
types of functions: linear, 
quadratic , and exponential. 

2. Linear Functions. We 
know from algebra that the 
general form of a linear equa- 
tion in two variables is 

Ax + By = C 

where A, B, and C are arbitrary constants. 

When 5^0, the equation may be solved for y, giving y = 
— (A/B)x + C/B which is of the form 

(1) y — mx + k 

and which is the form we will ordinarily use to represent a straight line. 

The special cases where A or B or C are zero is as follows: 

When A — 0, then y — C/B, which is of the form y = k. This is 
a line parallel to the x-axis. When Z? = 0, the equation takes the 
form x = k. which is a line parallel to the y- axis. When (7 = 0, then 
Ax + By = 0 which is a line passing through the origin. 

The graph of (1) is a straight line (which explains the term “linear”) . 
A characteristic property of a linear function is revealed at once by 
its graph. This is the fact that the ratio of a change in y to the 
corresponding change in x is constant. Thus, if two points (x h y x ) 
and (# 2 , 2 / 2 ) are chosen on the line, the value of the ratio 


90 

60 

70 

60 


1923 '24 '25 '26 27 '28 


Year 

Billions 

1923 

66.7 

1924 

72.7 

1925 

62.3 

1 926 

92.1 

1927 

93.0 

1923 

too. 0 


m = 


2/2 - 2/1 


Xi 
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is independent of the points chosen. This ratio gives the average 
rate of change of any function over the interval Ax = x 2 — x x . In 
the case of a linear function, m defines the rate of change of the func- 
tion. 

Graphically, m is the slope of the line. It is the tangent of the 
angle of inclination a (alpha) which the line makes with the positive 

x-axis. 1 Lines having the same slope 
are parallel, and conversely. 

It is shown in analytic geometry 
that we may obtain the slope of a 
straight line from its equation if we 
solve for y and take the coefficient 
of x. Thus in 2x — y 5, y = 2x 
— 5 and the slope is 2. 

Conversely, if we know the slope 
of a line and the coordinates of any 
point on the line we can write its equation from the relation 

(2) y - = m(x - x x ) 



which is called the point-slope form of a straight line. Thus, given 
that (2, — 1) is a point on a line whose slope is 2, the equation of the 
line is therefore y + 1 = 2(x — 2) or 2x — y = 5. 

Or again, remembering that m is defined by a ratio involving the 
coordinates of two points on a line, we can obtain the equation of a 
line if we know any two points which lie on it. From the definition 
of m and (2), we have 

(3) y - y i = (x - xi) 

X 2 — X\ 


which is known as the two-point form of a straight line. Thus, given 
that (2, —1) and (6, 7) are two points on a line, its equation is 

7 + 1 

V + 1 = §ZT2 ^ ~ or 2x ~ y = 5 * 

3. Quadratic Function. A quadratic function of a variable v 
is a polynomial of the second degree in v which may be expressed in 
the form Av 2 + 2 Bv + C where A , B , and C are fixed real numbers. 

1 When the line is vertical, a — 90° and m does not exist. Then Ax = 0 and 
division by zero is excluded in our algebra. 
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The minimum value of such a function is useful in statistics. We 
have 

Av 2 3 4 * 6 + 2 Bv + C = - (AV + 2 ABv + AC] 

A 

' = jl(Av + B)* + (AC - B>)]. 

Since ( Av + B) 2 is positive or zero and (AC — B 2 ) does not involve 
the variable, we have the following: * 

Theorem I. If A is positive the minimum value of Av 2 + 2Bv + C 
occurs when Av + B = 0; the minimum 
value is (AC — B 2 )/A. 

The graph of the equation y — Av 2 + 

2 Bv + C, (A >0), is a parabola which 
opens upward and whose vertex is where 
v = — B/A . Of course the function has its 
minimum value at this vertex, viz.: (y 0 , Vo) 
where v 0 = —B/A, y 0 = (AC — B 2 )/A . 

Exercises 

1. ( Wilson and Tracy) The premium ($y) on a SI 000 life insurance policy for 
various ages ( x years) is given in the following table. Draw a graph ex- 
hibiting y as a function of x. Estimate from the graph the premium at 
age 32 and at age 43; also the age at which the premium is $52. 



X 

20 

25 


35 


45 

50 

55 

60 

y 

18.78 

21.02 

23.86 

27.54 

32.36 

38.83 

47.68 

59.88 

76.94 


2. Find an equation of each of the lines through two points given as follows: 

(a) (2,6), (4,5); (i b ) (0,3), (1,6). 

3. Find the equation of a line through the point (2, 3) and parallel to the line 

4a; + 5?/ = 7. 

4. (a) Find the value of x for which f(x) = 2a; 2 — Sx + 0 has a minimum 

value. (6) What is this minimum value? (c) Draw a graph of y = f(x) 
and show the ipeaning of your answers to (a) and (6). 

6. How would the theorem in §3 be affected if A < 0? 

6. Prove that the second moment of 3 is a minimum when taken about the 
mean of x. 
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Hints. Solution 1. 
Let 


f(v) = t ; L fe ~ v > 

N i 

1 N 

- v 2 — 2xv + 

iV i 


By the theorem of §3, show that f(v) is a minimum when v — x. 
Solution 2. By definition, 

1 N 

M2 = t - 5) 2 
N i 

1 N 

»2 = ~r T Z) fe - *0 2 > 

TV i 


Is JU 2 < Z» 2 ? 

Solution 3, for calculus students. From/W as derived above, 


/(») = - - »)• 
A' 1 


7. 


Set /'(*>) =0 and solve for v. Since /" (v) >0, v ~z yields a minimum, 
not a maximum. 


N AT 

Show that the value of k for which f(k) = Nk 2 + 2 k(rnffxi — 53 2/0 + <? is 

l l 

a minimum is defined by 


N N 

mJ^Xi + Nk = J^yi. 

l l 


4. Fitting a Straight Line. The preceding discussion is intended 
as a basis for the presentation of certain methods of fitting a line to 
data. The equation y — mx + k represents a family or set of 
lines corresponding to different values of the arbitrary constants 
m and Av As noted previously, such constants are called parameters. 
The process of finding the best fitting line for any given data consists 
in determining m and A*. By “ best fitting ” we mean best under a 
criterion of approximation specified by a method. We will consider 
three such methods: (a) graphical , (b) the method of moments of 
ordinates , (c) the method of least squares . 

5. Graphically. A straight line is drawn (preferably with the aid 
of a transparent ruler) to fit as closely as possible the plotted points. 
To find the equation of this line, select two points on the line and esti- 
mate their coordinates fa, yf) and (z 2 , yf)- Substituting these coor- 
dinates in the “ two-point ” form of the line (3), we get the desired 
equation. 
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If the first point is chosen so 
that Xi = 0 the numerical work 
of simplifying the equation is 
somewhat lessened. 


Example 3. Fit a line graphically 
to the data in Example 2. 

We take the origin of x at 1923, 
hence from the figure (xi = 0, yi = 
67) and (x 2 = 5, y 2 = 100). 

By equation (3), 


y -67 


100 - 67 
x. 


Therefore, 

2 / = 6.6a; ■+- 67 


is the required equation. 



X 

y 

( 1923 ) O 

66.7 

1 

72.7 

2 

82.3 

3 

92.1 

4c 

93.0 

(1928) 5 

100.6 


The graphical method is open to the objection that it depends 
upon the judgment of the investigator. Different people will lo- 
cate the line in different positions and therefore obtain different equa- 
tions. However, where only approximate results are needed it is 
usually quite satisfactory. 

6. Method of Moments. In equation (1) y is not only a function 
of x but it is also a function of the parameters m and k. This func- 
tional relationship may be expressed symbolically by the notation 
f(x, m, k). Given the functional form of a curve y — f(x, a , b, 
c, • • •) the parameters a, 6 , c, • • * may be determined by obtaining 
expressions for as many moments of the computed or functional y’ s as 
there are parameters in the function and equating these to the numeri- 
cal moments of corresponding order of the observed or empirical y’s. 
A solution of the resulting equations, theoretically possible, gives 
the “ best ” values of the parameter. This is the method of moments 
of ordinates. For a set of N values of (x i} yi) the rth moment of y is 
defined by the expression 



where r is zero or a positive integer. 

In fitting a straight line by this method we obtain two equations 
involving m and k if we equate the zero th and first moments of the 
observed y’s to the zeroth and first moments, respectively, of the y’s 
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computed from the assumed equation y — mx + k. All moments 
are taken about the origin of x. These two equations may then be 
solved for m and k . The procedure will be made clear by the figure 
and explanation below. 



X 

0 y 

X 

c y 


Yi 

x i 

mXj + k 

X 2 

y 2 

x 2 

mx 2 + k 


• 0 

4 4 

4 m 

X i 

V| 

x i 

mx-, + k 

•* 

■ ■ 

a a 

•• 

x n 


x n 

mx n + k 


Suppose we are given N pairs of values of x and y. Denote the 
given or observed y ’ s by 0 y and the computed y’s by c y. For the 


observed y’s, the first moment is — and the zeroth moment is 



By a iC computed y ” corresponding to any value of x we 


mean the result obtained by substituting that value of x in the equa- 
tion y = mx + k , and solving for y. Thus, for any value of x, say 
Xi, we obtain mxi + k for the corresponding computed yv . Graphi- 
cally, it is an ordinate of the line. Therefore, the first moment of 

the computed y’s is ~ + *), and the zeroth moment is 
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~ ^2xi°(mxi + k). Applying the principle of moments we have 

observed computed 
zeroth moment 'Jjyi = 23 + fr) 

first moment 23^2/* = 23 a h'( m ^» + k) 

where the summations run from 1 to N. 

To solve for m and k we write the preceding equations in the follow- 


ing form: 

(4) { 

By determinants, 

rri^Xi + kN = 
m E^ 2 + *Ex; = Y^XiVi. 


m = ■ 

E X V 

N 

2> 

! 

(Ey)(E x ) - N E*y 

(5) * 

2> 

N 

(Ex) 2 - ^E* 2 

i 

k = 

V 

I> 

2> 2 

'Ey 

Exy 

(Ex)(Exj/) - E^Ex 2 



D 

D 


The determinant D in the expression for k is the same as that in the 
denominator of the expression for m. [In order to solve equations 
(4) for the values (5) it” is assumed that D does not vanish.] The 
terms in the expressions for m and k refer to the original data. When 
these expressions have been evaluated they replace m and k in the 
equation y = mx + k. 


Example 4. Find by the method of moments the best fitting line for the data 
in Example 2. 


X 

y 

xy 

X 2 

0 

66.7 

0 

0 

1 

72.7 

72.7 

1 

2 

82.3 

164.6 

4 

3 

92.1 

276.3 

9 

4 

93.0 

372.0 

16 

5 

100.6 

503.0 

25 

15 

507.4 

1388.6 

55 

(507.4) (15) - 

6(1388.6) 

15(1388.6) - 

55 < 507 ' 4 > = 6 7 4 

m ~ (225) - 

6(55) " 6 86 

k ~ D 
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Therefore, 


y = 6.862 + 67.4. 


7. An Alternative Procedure. In practice, it is sometimes easier to 
remember the procedure of fitting a line by the method of moments if 
one obtains the equations in (4) directly from the data instead of using 
the formulas for m and k. This will involve the following three steps: 

(а) Substitute each of the given pairs of values in y = mx + k and 
add the corresponding members of the resulting u equations.” This 
gives the first equation in (4). 

(б) Multiply each “ equation ” in (a) by the coefficient of m in 
that “ equation ” and add the corresponding members of the re- 
sulting “ equations.” This gives the second equation in (4). 

(c) Solve the equations simultaneously. This will give the 
required values of ra and k. 

The algebraic statements which we designated “ equations ” (de- 
noting that the statements are only approximately true) are called 
observation equations in the theory of errors. A linear combination 
of a set of linear observation equations is a true equation. 

Example. Verify, for the data in Example 2, that the above procedure gives 
the same values of m and k as the formulas. 


Step (a) 

66.7 = 0 m+h 

72.7 = lm + k 
82.3 — 2m + k 

92.1 = 3 m + k 

93.1 =4 m + k 

100.6 = 5 m + ft 

507.4 « 15 m + 6 k 


Step (6) 

72.7 = m + h 

164.6 = 4m + 2k 
276.3 «=* 9m + Sk 

372.0 = 16m + 4 k 

503.0 = 25m + 5k 

1388.6 = 55 m + 157b 


Step (c) 


Solving the equations, we obtain m = 6.86, k — 67.4, as before. 


8. Least Squares. Case I. A standard method of fitting a curve 
to empirical data is one known as the method of least squares. As- 
sume, as before, that the plotted 
data suggest the linear relationship 
y — mx + k. Let d represent the 
difference between the ordinate of 
any given point and the correspond- 
ing ordinate of the line, that is, 
di = [yi — (mxi + k)]. These dif- 
ferences are called residuals . The 
method of least squares is based upon the following principle. 
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Principle of Least Squares. The “best ” estimate of a param- 
eter is that for which the sum of weighted squares of the residuals is a 
minimum. 1 

The sum is to be taken over all the observations that are subject 
to error. We shall assume that the observations are all of equal 
weight; consequently we may let each of the weights be unity. Then 
the parameters m and k are estimated by imposing the condition that 

N 

^2di 2 be a minimum. Now 
]£d 2 = - (mx + A*)] 2 

(6) = Nk 2 + 2 mk^Tx + m 2 ^x 2 — 2 k^y — 2m^T i xy + 2 y 2 . 

This is a quadratic polynomial in k. We may write it in the form 

(6a) f(k) = Nk 2 + 2 k(m%jc - J^y) + C 

where C represents the terms not involving k. Then according to 
Theorem I the minimum value of f(k) occurs when 

t. 

*• X 

that is, when 

Nk + mYjX — — 0 . 

The right meiiiber of (6) is also a quadratic polynomial in m. We 
must choose m so that 

+ kj^x - J^xy = 0 . 

These last two equations 2 are the same as (4). When obtained by 
the method of least squares they are called normal equations. There- 
fore the values of m and k in (5) determine the best fitting line by 
both the method of moments and of least squares. It can be shown 
that the two methods give the same result for any polynomial. 3 

It is interesting to observe that the sum of the residuals is zero. 
Thus it can easily be shown that ^[y — (mx + k)] = 0, when the 

1 For further information about this principle and a discussion of weights, the 
following books are recommended: (a) Reference 4. ( b ) Statistical Mathematics 
— A. C. Aitken. Oliver and Boyd. 

2 The student of calculus would obtain these equations as follows. Let 
Sim, k) = ^(y — mz — k) 2 . Then differentiate Sim, k) partially with respect 
to m and k, respectively, and equate the results to zero. 

3 See American Mathematical Monthly , September, 1923. 
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values given in (5) are substituted for m and k. This property and 
the fact that the sum of the squares of the residuals is a minimum are 
quite analogous to two similar properties of the arithmetic mean, viz., 

(1) The sum of deviations from the mean is zero. 

(2) The sum of the squares of deviations from the mean is less than 
the sum of the squares of such deviations taken from any other value, 
i.e.j 1 x 2 < v 2 . 


Case II. In Case I distances between the points and the line were 
taken parallel to the y- axis. But we may just as logically, from a 

formal point of view, take dis- 
tances parallel to the a>axis, and 
make the x residuals the basis for 
a least squares criterion of best 
fit. Similarly, for the method of 
moments: we can set up two 
equations such that the first mo- 
ment of the observed x’s equals 
the first moment of the computed 



d r x r( m 2Vi+ b ) 


Case II 


x’s, and the zero th moment of the observed re’s equals the zeroth mo- 
ment of the computed. To do this let x = m 2 y + b represent the 
equation of the line. . Then by the principle of moments we have 


E x = E( m *y + o 

I Z x v = Ey( m *y + &)• 


Solving for m 2 and b we obtain 



L “ Nj jcy 

m2 = 

h = yy ~ 

D 

t d = ('Ey ) 2 - nEv 2 - 


If we determined ra 2 and b by making the sum of the squares of 
the x residuals a minimum we would get the results given in (7). 
The expressions in (7) are those of (5) with x and y interchanged. 

In general, Cases I and II will give different lines. Case I assumes 
that the observed points fail to fall on the line because of errors 
in the ordinates only. Case II assumes that only the rc-coordinates 
are in error. In the application of curve fitting to economic.. data, 
etc., the formal mathematical procedure should not be used without 


Sec. 8 


Least Squares 


147 


first verifying that the underlying assumptions involved in the pro- 
cedure are justified. Inasmuch as the independent variable x can 
be controlled in experimental and 
observational data, the errors 
usually exist only in the y’s. 

Therefore, in speaking of the best 
line by the method of moments 
or least squares it is conventional 
to mean the line which fits best in 
the sense of (5) rather than (7). 

Case III ( for calculus students ). 

A third line can be obtained 
which fits best in the sense that 
the sum of the squares of the per- 
pendicular distances from the points to the line is a minimum. 

Let us suppose the equation of this line to be in the form 

y' = mx f + k 

where x f = x — x, y f = y — y, and (55, y) is the mean of the ob- 
served data. The distance di from this line to a point (x/ t y/) rep- 
resenting a pair of observed values (referred to their respective 
means as origin) is, from analytics, 

, y{ - nix/ - k 

a% / 

Vm 2 + 1 

‘ we are to choose 

m and k so that the function 

- mx/ - 

is a minimum. This function may be written in the form 

/(m, k) = 7 - 7 -7 ( <r y 2 + k 2 + ra 2 <r x 2 — 2mr<j y <? x ) 

where r is a convenient symbol defined by the relation 

1 N 

r<jy<j z = ™ 'YjXlyl. 

iV 1 


1 N 

We wish to make — ^di 2 a minimum. Therefore 
A 1 



Case III 
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To mak e f(m, k) a minimum we first put k z = 0. Then we equate 
to zero the first derivative with respect to m and obtain 

m 2 ro y ffx — wt(cj/ 2 — v x 2 ) — r<ry<j x — 0. 

Solving for m we have 

^ W - aS) ± [W ~ <r* 2 ) 2 + 

m 2ra y <r x 

Therefore the required equation is y' = mx'. Referred to the origin 
of x and y, this is 

y — y = mix — x ) 
where m is determined above. 

This line is the appropriate one to fit if there are errors in both x 
and y of the empirical data. 

A special problem under Case I. Sometimes problems arise where 
the line to be fitted is restricted in some way. For example, the 
nature of the problem may require that the line shall pass through 
the origin. If this condition is imposed, (1) takes the form 

y = mx. 

The least squares estimate of the slope of this line depends upon 
various assumptions about the errors. If y is subject to error and 
x is free of error, and if the observations are all of equal weight, it is 
easy to show that 

m = I> 

by the principle of least squares. This principle will give different 
estimates of m under different assumptions about the weights of 
the observations. Several particular solutions of the more general 
problem and some applications will be found in §15 of reference 4 
on page 6. (See also our Exercise 11, p. 189.) 

Exercises 

1. Fit a line to the following data by Case I: 

Aws. y = —,5x + 8. 


X 

6 

7 

7 

8 . 

8 

8 

9 

9 

10 

y 

5 

5 

4 

5 

4 

3 

4 

3 

3 


Ijxy 
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2. Show that — 0 for Exercise 1. 

3. Using the values given in (5) for m and k show that £[y — (mx -f &)] = 0. 

4. Verify the expressions for m 2 and b given in (7). How would you modify 

the “ alternate procedure ” so it will apply to m 2 and 6? 

5. Fit a line to the data of Example 2 by the method of Case II. 

6. Show that the formulas in (5) fail when the x’s are all equal. Hint. Re- 

place £ by a constant c in the denominator D. 

9. Simplification. The formulas for m and k may be simplified. 
For certain purposes it may be desirable to make the transformations 
x' = x — x and y' = y — y. This has the effect, graphically, of 



translating the origin to the point (x, y) so that the y-axis is moved 
to the value x, and the z-axis is moved to the value y. Let the equa- 
tion of the line .with reference to these new axes be y r = mix' + k\. 
The formulas for mi and ki will be the same as for m and k except 
that x will be replaced by x' and y by y\ Hence 

nYj x 'v' - zyxy 

mi " N&* - (2»* 

, I>' 2 Zy - IXEgY 

1 Njix'* - 0 »* 


But since x' is a deviation from the mean of x, ]£x' = 0. Similarly, 
^ y ' = 0. Hence the values of mi and ki reduce to 


( 8 ) 


mi 


i>v 


ki = 0 . 


Therefore the line goes through the new origin, and its equation is 

( 9 ) V f = rnix f 

where mi is defined in (8). 
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The above transformation may not lighten the computations un- 
less the values of x or y are equispaced. However, it does simplify 
the theory in certain applications, particularly in correlation theory 
(Chapter VIII). 

10. Time Series. If one of the variables is time, as in Examples 1 
and 2, the data are called a time series. The best fitting line is then 
commonly called a trend line or trend. In the process of fitting a 
trend line, a first simplification, obviously, is to take the origin at one 
of the given dates as we did in Example 3. But a much greater 
simplification is possible, if the x’s are equispaced, as they usually 
are in a time series. Denote the common differences of the x’s by c 
and the mid-date by x. Then we may shift the origin to x and change 
the unit of measurement along the horizontal axis to c. Thus we may 
let 


( 10 ) 

where 

(ID 



x = 


Xi + X N 
2 


if the z’s are equispaced. 

Let us think now of our line in (t, y ) coordinates, and let its equa- 
tion be y = at + b. Our problem is to find a and b numerically from 
the given data, as we found m and k before. Our normal equations 
will be 

22y = 22( a * + b) 

2 Zty = 22 fa* + b)t. 


Since 22 * = ~ 22 — ~ °> and 22 & = Nb, the above equations 

c 

are readily solved, giving 


( 12 ) 




The student should remember that this simplification can be used 
only when the x’s are equispaced. 

Example 5. Find the trend line for the following data. Here c — 5, and from 
(11) x - 10. 


Sec. 10 


Time Series 


151 


X 

y 

* 1 

ty 


0 

12 

-2 

-24 

4 

5 

15 

-1 

-15 

1 

10 

17 

0 

0 

0 

15 

22 

1 

22 

1 

20 

24 

2 

48 

4 

Sums 

90 


31 

10 


From (12), 



3.1, 



So the required equation is y = 3.1/ + 18, with reference to the new origin and 
units. If we wish it in terms of x , we substitute 


t *= 



and obtain y - .62x d- 11.8. 

Example 6. Same as Example 5, with another observation added. Note that 
when there is an even number of observations, the values of t are fractional. 
In this case it is convenient to use the column headings 2 ty instead of ty, and 4 1 2 
instead of t 2 . 


X 

y 

t 


4* 2 

0 

12 

-5/2 

-60 

25 

5 

15 

-3/2 

-45 

9 

10 

17 

-1/2 

-17 

1 

15 

22 

1/2 

22 

1 

20 

24 

3/2 

72 

9 

25 

30 

5/2 

150 

25 


120 


122 

70 


x = 12.5, = 61, 'Ll 2 = 17.5 

a - 3.49, 6 = 20 

y - 3.49* + 20 

, = 3 . 49 (izi!£) + 2 o 

y = .7* + 11.28. 
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11. Exponential Trends. When the given y values form a geo- 
metric progression while the corresponding x values form an arith- 
metic progression, the relationship between the variables is given 
by an exponential function, and the best fitting curve is said to 
describe an exponential trend. Data from the fields of biology, 
banking, and economics frequently exhibit such a trend. Thus the 
growth of bacteria is exponential. Money accumulating at com- 
pound interest follows the same kind of law of growth. And in busi- 
ness, sales or earnings may grow exponentially over a short period. 
Another familiar example is the increase in friction as a rope is 
coiled around a post. As the number of coils increases in arith- 
metic progression, the friction increases in geometric progression. 1 
This explains why a few turns of the hawsers around the bitts at the 
wharf is sufficient to hold a large ship. 

The characteristic property of this law is that the rate of growth, 
that is, the rate of change of y with respect to x } at any value of x is 
proportional to the value of the function for that value of x. The 
function 

( 13 ) y = Ae Bx 

has this property. 2 The letter e is a fixed constant, whereas A and 
B are parameters to be determined from the data. If y decreases 
as x increases, B is negative. An interesting example of this case is 
the disappearance of radioactive substances like radium. 



Fig. 28 — General Appearance of the Graph of (13) for x ^ 0 and A > 0. 

To assume that the apparent law of growth will continue is usually 
unwarranted, so only short range predictions can be made with any 
considerable degree of reliability. When the exponential character 

1 Elementary Mathematical Analysis — C. S. Slichter. McGraw-Hill. 

2 The student of calculus will understand that “ rate of change ” is used here in 
the derivative sense. For (13), dy/dx = ky. 
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of the observed phenomenon ceases a saturation point is said to be 
reached. 

The 'parameters A and B . If we transform (13) so that it is linear 
with respect to its parameters we may use the methods for fitting 
a straight line to determine A and B. To this end we first take the 
logarithms (to base 10) of both sides of (13), obtaining 

(14) log y = log A + (B log e)x 
which is of the form 

(15) Y = k + mx 

where Y = log y,k = log A, m = B log e. 

If we look up the logarithms of the given y’s and denote them by Y , 
we may fit the equation Y = mx + k to the (x, Y) values by deter- 
mining m and k by means of the formulas given in (5). In using 
these formulas we must remember to replace y by Y. After m and k 
are determined, A and B may be obtained from the relations 

A = anti-log of k 

B = mf log e , where log e = log 2.718 

= .4343. 

The student may be interested to verify that the relation Y = mx + k 
can be put back into the form (13). We may write (14) in the form 

y _ lQ lo g A + (B log e)x 

= { 10 log ^ } { 10 log e } Sa: 

= Ae Bx . 

The last step follows because 10 loguAr = N by definition of logarithm. 

Example 7. Find the exponential trend for the following data, and draw the 
curve. 


X 

y 

Y 

xY 

s 2 

1 

1.6 

.2041 

.2041 

1 

2 

4.5 

.6532 

1.3064 

4 

3 

13.8 

1.1399 

3.4197 

9 

4 

40.2 

1.6042 

6.4168 

16 

5 

125.0 

2.0969 

10.4845 

25 

15 


5.6983 

21.8315 

f 55 
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From (5) we have, 

D = (Ex) 2 - N^x* 
m = j^l'ZY'Ex -NX*?] 

k = ±[T.zT.*Y-'LY'Zx*). 

Therefore, 

Z) = [ (15) 2 - 5(55)] = -50 
m = [(5.6983) (15) - 5(21.8315)] = .4737 

k = i [15(21.8315) - (5.6983) (55)] 

= -.2813 = 9.7187 - 10. 


And 


log A = 9.7187 - 10, hence A = .5232 


Therefore the required equation is 

y = .52S2e umx . 

When the x’s are equispaced, as here, the work may be simplified by using (10) 
and fitting a line 

F = at + 6. 

The problem now is essentially the same 1 as in §10 where a and b are defined in 
(12) except that we are now dealing with ( t , F) coordinates instead of (t, y). 

The method is illustrated below. 


t 

F 

; 

tY 

t 2 

-2 

.2041 

-.4082 

4 

-1 

.6532 

-.6532 

1 

0 

1.1399 

0.0000 

0 

1 

1.6042 

1.6042 

1 

2 

* 2.0969 

4.1938 % 

4 

t = x — 3 

5.6983 

4.7366 



1 The critical reader will realize that fitting a straight line to the values of log y 

is not quite the same as fitting an exponential to the values of y. However, the 
discrepancy usually does not affect the fit seriously. For a method which is free 
from this difficulty, see Glover's Tables , p. 468. 
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From (12) 

4.7366 

“=I> = io =‘ 4737 

6 l Ey = 5.6983 = 

N 5 


So 



Y = .47374 + 1.1397. 



Transforming this into (x, Y) coordinates we have 

Y = .4737 (x - 3) + 1.1397 
= .4737z - .2814 

as before. 

For purposes of plotting, predicting, or interpolating, values of y in (13) may 
be obtained by means of the intermediate form (15). So, to sketch the curve 



Fig 29 

for this example, we first assign values to x in the last equation, compute the 
corresponding values of Y, and then obtain the values of y from a table of loga- 
rithms. These values are given in the following table. The curve in Figure 29 
is sketched from the (x, y ) values in this table. 
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12. Further Remarks on the Exponential Function. Equation (13) 
is sometimes called the compound interest law because it describes 
the way money would grow if interest were compounded continu- 
ously. If P dollars are invested at a nominal rate j% compounded 
m times a year, the amount S after x years is given by the formula 




If j is compounded continuously or, in other words, if m is taken 
indefinitely large (written m — » <»), the amount S does not increase 
indefinitely but approaches a limiting value. We may write the 
expression for S in the form 


S 


1+ i) J • 


If we let N = m/j, we have 


S = P 



It can be shown in the calculus 1 that, as N— > oo, the quantity 


M' 


approaches the limit called e. Thus we have 


lim (l + ^ = e = 2.718 

W->e= \ NJ 


This limit is also the base of the Napierian, or natural, system of 
logarithms. As m —> oo so does N — > <*> . Therefore in the ideal case 
of continuous conversion of interest, we have the limiting form 


S = lim P 

7Tb — >co 

— lim P 

N — »CO 


that is 


K-ri 

K i+ s)T 


8 = Pe H 


which is of the form (13). 

There are several other forms of the exponential function, 
example, if we let r = e B , (13) becomes 


For 


y — Ar x 

1 The teacher can give appropriate references. 
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which is the general term of a geometric progression whose first term 
is A and common ratio is r. 

If B is negative in r = e B then r < 1. So (13) is a decreasing func- 
tion when B is negative. 

If we let 10* = e B , (13) becomes 

y = AlQ k *. 

Then k — B logio e and k differs from B by the factor logic e . This 
factor is known as the modulus of the system of logarithms of base 10 
with respect to the system of base e. 

The value of the reciprocal of the modulus 

— 1- = 2.3025851 • • - 
logio e 

is often useful. For example, suppose that the logarithm to base e 
is required for a given number N and tables to base 10 only are 
available. Let log* N = x. Then e x = N, and x logio e = logio N, 
whence x = logio A/logio e ~ 2.303 logio N. (Hereafter, the base 10 
will be understood unless otherwise indicated.) 

13. Ratio Charts. In the graphical representation of data that 
exhibit an exponential trend, it is often desirable to use semi-logarith- 
mic paper. Such paper has a logarithmic scale in the vertical direc- 
tion and a uniform scale in the horizontal direction. (Figure 30.) A 
logarithmic scale is one in which the distance from y = 1 to y = N 
equals log N, A “ cycle ” of rulings spaced according to the loga- 
rithms of the integers from 1 to 10 is the unit of the vertical log y 
scale. 

“ Semi-log ” paper may be constructed or purchased having one 
or more cycles. The appropriate number of cycles is determined 
by the range of y values in the data to be plotted. If the bottom line 
of the first cycle is labeled 1 and taken as the origin of log y 
(log 1 = 0), the beginning of the next cycle is read 10 (log 10 = 1), the 
next one above that is read 100 (log 100 = 2), etc. However, the 
beginning of the first cycle may be labeled with any number which 
is an integral power (positive or negative) of 10, as .01, .1, 10, 100, etc. 
Corresponding lines in successive cycles are labeled with numbers 
which are 10 times those in the preceding cycle. Since y has no real 
logarithm if y S 0, neither zero nor negative numbers are found on 
a logarithmic scale. Plotting a point whose semi-logarithmic co- 
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ordinates are (x, y) is equivalent to plotting the point whose rectangu- 
lar coordinates are (x, log y). 

Example 8. Plot y * 8 (2*) on semi-log paper. 

Solution . Assigning values to x we form the following table, 


X 

-3 

-2 

-1 

0 

1 

2 

! 

3 

4 

y 

1 

2 

4 

8 

16 

32 

64 

128 


from which we obtain the semi-logarithmic graph shown in Figure 30. 

We now state the following theorem. 

Theorem II. If A is a positive constant , the (, x , log y)-graph of 
y — Ae Bx is a straight line. 



Proof : Since (15) is linear in x and Y, its graph in {x, Y) rectangu- 
lar coordinates is a straight line. 

Semi-logarithmic graphs are also called ratio charts . Their useful- 
ness depends upon the property of logarithms that 

log ^ = log M — log N. 

It follows that the distance between any two ordinates of the chart 
measures the ratio between the values represented by these ordinates. 



Sec. 13 


Ratio Charts 


159 


Thus if 


then 

or 


Vi = y$ 
y * 2/4 

log 2/1 - log 2/2 = log 2/3 ~ log 2/4 
Yi — Y% = Y$ — Y 4 , 


that is, equal ratios are represented by equal vertical distances. 
Likewise, if 


then 


Vi V* 

2/2 2/4 

Y 1 -Y 2 >Yz - 7 4 


and the larger ratio is represented graphically by the larger distance. 
These differences of elevation are independent of any base line. 
The same percentage increase in y is represented by the same addition 
to the height of Y in all parts of the chart. Hence, it is easier to 
depict and discover percentage changes on ratio charts than on 
ordinary charts. 

The analysis of time series in economic statistics is often facilitated 
by forming “ link relatives ” which are ratios of each ordinate (after 
the first) to the preceding ordinate. Thus, if 2 / 1 , 2 / 2 , * • *, y n are the 
given values, the link relatives are 


Bi 


y 2 p 2/3 

— > n 2 = — > 

y 1 2/2 


Rn- 


y n 

2/»-i 


Any link relative R denotes the percentage change in y from one 
month (say) to the next. If the y’s are plotted on ratio paper they 
will lie on a straight line when the R ’ s are equal, on a curve bending 
upward when the R’s are increasing, and on a curve bending down- 
ward when the R’s are decreasing. It follows that if two curves are 
parallel on ratio paper their rate of increase (or decrease) is the same. 

For further discussion of ratio charts the student is referred to the 
books of Bivins and Haskell (see §7, Introduction). 

Graphical determination of exponential function. It follows from 
Theorem II that data giving a straight line when plotted on semi- 
logarithmic paper (with x on the uniform and y on the logarithmic 
scale) satisfy an equation of the form (13). Suppose that the 



160 


Curve Fitting 


VII 


(straight line) graph has been drawn and one desires the exponential 
function which the line represents and the data satisfy. The con- 
stants A and B in (13) can be approximated by the following method. 1 
We first observe that the slope of the line represented by (15) is 
given by 

pi F2 “ Fl 

m = B log e 

— Xi 


To determine the numerical value of B , take one cycle of y (over 
which the graph extends) from any starting point and read the cor- 
responding values of x (Figure 31a), so that 

B = V*- Yi = log <yM = log 10 = ^303 ^ 

(x 2 — Xi) log e Ax log e Ax log e Ax 




(a) (b) 

Fig. 31 

In case the graph does not extend over one cycle, determine x for 
y = e and y = 1; then (Figure 31 b) 

B ^ loge = JL # 

Ax log e Ax 

The sign of B is of course positive if the graph has a positive slope 
in the ordinary sense and is negative for a negative slope. 

If the graph intersects the line x = 0, the value of A can be read 
off at this intersection. If, in the data involved, the graph does not 
intersect the line x — 0, A can usually be determined by finding y 
for some convenient values of x such as Bx — some integer n, where- 
upon A — y/e n from equation (13). 

In practical problems, the plotted points representing the data 

1 Note on Semi-Logarithmic Graphs — W. T. Lenser, The American Mathe- 
matical Monthly, vol, 49 (1942), pp. 611-613. 
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will not usually fall exactly on a straight line. But if they exhibit 
a linear trend one may draw (with the aid of a transparent ruler) 
the line that seems to fit them best. Then proceed as above. 

Example 9. The uniform scale along the horizontal axis of a sheet of semi- 
logarithmic paper ranges from 0 to 10; along the vertical axis the logarithmic 
scale ranges from 100 to 1000. A straight line is drawn on the paper from the 
upper endpoint of the vertical scale to the midpoint of the horizontal scale. 
Determine (i) the equation of the exponential function represented by the line, 
(ii) the equation of the line in (x, Y) coordinates. 

Solution 1, using above method. A = 1000. B ~ —1/(5 log e) = ( — 2.3)/5 
= —0.46. Hence, the desired equation (i) is 

y = 1000e~° ,4&r . 

The slope of the line is m = B log e = —J and its equation (ii) is 

Y = 3 - 0.2z. 

Solution 2. The line goes through the points (0, 1000) and (5, 100). Substi- 
tution of the first pair of coordinates into (13) gives A — 1000. Substitution 
of the second pair into y — 1000e Bx gives 100 = 1000e 5S . Then e~~ lB — 10 and 
—5B — log* 10 = 2.303, whence B — —0.46. 

14. Logarithmic Coordinate Paper. A function of the form 

(16) y = kx m 

is called a power function . If k > 0 we have 

(17) Y = K + mX 

where the capital letters denote the logarithms of the corresponding 
lower-case letters. Form (17) suggests the usefulness of logarithmic 
coordinate paper on which the rulings in both directions are at dis- 
tances from the origin that are proportional to the logarithms of 
the numbers represented. To mark on this paper a point whose 
ordinary coordinates are (Xi, F x ) we plot the point whose rulings 
correspond to the numbers Xi and y\. 

It is evident from (17) that the graph of (16) is a straight line on 
logarithmic coordinate paper. It also follows from (17) that the 
problem of fitting a curve of the form (16) to a set of observations 
can be reduced to the problem of fitting a straight line. 

Example 10. A straight line is drawn on logarithmic coordinate paper through 
the points (4, 16) and (6, 54). Determine the function y — f(x) which has that 
line as its graph. 



162 Curve Fitting VII 

Solution 1. Substitution of the coordinates of the given points into (16) gives 

/ 16 = &(4 W ) 

\54 = &(6 W ). 

Upon dividing each member of the first equation by the corresponding member 
of the second, we obtain 8/27 = (2/3 ) m whence by inspection m - 3. Then 
k a= J, and the required function is 4 y = x 3 . 

Solution 2. Substitution of the logarithms of the given coordinates into (17) 
gives 

f 1.20412 * j K + 0.60206 m 
(1.73239 « K + 0.77816m. 

Solving, m = 3 and K — -. 60206 = 9.39794 - 10, k = .25. 

15. Parabolic Trend. Data of broad economic or social signifi- 
cance extending over a long period of years may often be described 
by an arc of a second degree parabola. The equation of a parabola • 
is of the form 

y = a + fix + yx 2 

where a , y are the parameters to be determined. 

If the x’s are equispaced we may let 

x ~ x 

t = > 

c 

where x — (xi + Xn)/ 2 and c = | Xi+i — Xi |, and thereby effect 
considerable simplification in evaluating the constants. In t and y 
coordinates the equation will, of course, involve different constants 
and we may write its equation in the form 

(18) y = A + Bt + Ct\ 

The method of moments may again be used and since (18) is a poly- 
nomial this method also gives the best fitting curve in a least squares 
sense. Because there are three constants to be determined we must 
equate the second moments as well as the zero th and first moments. 
Imposing these conditions of moments between the observed and 
computed ordinates, we obtain the three normal equations: 

2 > = NA + 

I> = 

Yt 2 y = • 
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Since the mean is chosen as origin = 0. With this choice of 
origin and because the x’a are equispaced it can be shown that 
= o. Therefore the normal equations simplify into 


(19) 


B = 




Z < 2 

iuv+c2> = 2> 

[AZp + c'Zr-jfr. 


When the summations involved in these equations are evaluated 
from the data the values of A, B, and C can easily be determined. 


Example 11. Fit a parabola to the following data. 


Number of Divorces per 1000 Marriages in the United States 

1900-1930 


Year 

y 

x 

t 

ty 

t 2 

t 2 y 


1900 

81 

0 

-3 

-243 

9 

729 

81 

1905 

84 

5 

-2 

— 168 

4 

336 

16 

1910 

88 

10 

-1 

- 88 

1 

88 

1 

1915 

104 

15 

0 

0 

0 

0 

0 

1920 

134 

20 

1 

134 

1 

134 

1 

1925 

148 

25 

2 

296 

4 

592 

16 

1930 

170 

30 

3 

510 

9 

1530 

81 

Sums 

809 

2 = 15 


441 

28 

3409 

196 


From (19), 


B = 


441 


28 

7 A + 28C = 809 
28A + 196C = 3409. 

Solving thejast two equations simultaneously we obtain, 

173 


, 322 

A= _, 


C 


84 


Therefore, 


322 , 441 , 173 

%i — — — — ■ *4* t 4* — — * 

y 3 28 - 84 


If we desire the equation in the original form we substitute t — l(x — 15) and 
obtain 

322 441 /t. - ik\ 17.*? /* — 15\2 


y 


' 441 (x - 15 \ 173 /x - 15 V 
+ 28 \ 5 / + 84 \ 5 ) 
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which simplifies into 

y = 78.62 + .682 + -08242 2 . 

Upon the hypothesis that divorces will continue to increase according to this 
trend, we may estimate the number for 1950 for example. When 2 — 50 in 
the above equation, we find y = 318.62. 

16. The Gompertz Curve. The curve which bears his name was 
suggested in 1825 by Gompertz for use in actuarial science. Recently 
it has had some application as a growth curve in business and popula- 
tion forecasting and in certain problems in education. Its equa- 
tion 1 is 

(20) y = kg* x . 


To determine the parameters, we first transform (20) into the loga- 
rithmic form 

(20a) Y = K + Gc* 


where Y = log y, K — log k, G — log g. The number, N, of obser- 
vations available must be such that N ~ 3 n where n is the number in 
each of three subgroups with no observations omitted; that is, N 
must be divided into three blocks of data consisting of n items each. 
It is also necessary that the values of the independent variable x be 
equispaced. Then the origin can be chosen so that x takes the 
values 0, 1, 2, • • •, 3 n — 1. If these values of x are substituted in 
(20a) we obtain the three sets of functional Y’ s shown in (a), (6), 
and (c). 


0 

Fo ] 


Y 0 = K + Gc° ] 

1 

Y x 

• • * 

n— 1 

z ri 

i = 0 

Yx = K + Gc 

n — l 

Yn-l , 


Y n -i = K + Gc ^ J 

n 

Y 1 


Y n = K + Gc- ) 

n + 1 

Y„+i 

■ ■ -1 

2n— 1 

Z Y.i 

i —n 

Y n+ \ = K + Gc 

1 

Y 2n— 1 j 


Yu- 1 = K + Gc 2 " -1 J 

2 n 

Yu ) 


Yu = K + Gc 2n ' 

2ti 4“ 1 

Y 2n-f I 

3n-l 

Z Ft 

. i =2 n 

Yu+i = K + Gc 2n+l 

3n — 1 

Y 3 «— I j 


Y 3n - 1 = K + G<?*-\ 


1 For a derivation see Mathematical Theory of Life Insurance — 
John Wiley and Sons, Inc. 


(a) 

(b) 

(c) 

Forsyth. 
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Let Si, S 2 , S 3 denote respectively the totals of the subgroups (a), 
(6), and (c). Thus we have 

-Si = nK + 6(1 + c H h c"- 1 ) 

5 2 = nK + Gc n ( 1 + c + • • • +c B_1 ) 

5 3 = nK + <7c 2 "( 1 + c H h c"- 1 ). 

Then 

S 3 - Si = G(c n - 1)(1 + c 4 f- c”- 1 ) 

S 3 - Si = <?C n (e B - 1) (1 + c + • • • + c"- 1 ) 


whence we obtain 


& - Sj 

Si -Si’ 


Writing the expression for <S 2 — <Si in the form 


Si — Si — G 


(£ ~ l ) 2 

c — 1 


and solving for G, we obtain 


(& - giKc - i) 
(e B - l ) 2 


The expression for -Si may be written 

<7(1 


Si = nK + ■ 


c n ) 


1 — c 


so we have 




In the above expressions, S h $ 2 , S 3 denote sums of the functional 
F J s. If these are now replaced by the empirical data so that 

n— 1 2n—l 3w—l 

Si= 'E Yi, Si = E Yi, S 3 = E Yi, 

0 n 2 n 


where F* refers to the observed F’s, then c can be determined 
from the expression for c n . Using the value of c, G can be deter- 
mined, and then K . 

If c < 1, it is clear from (20a) that Y — > K as 2 ; <x. Then 
y = k is an asymptote and k is sometimes called the ceiling of the 
curve. (See Figure 32.) 

For an application of the above method to a problem in business, 
see Statistical Methods {Revised Edition) by Mills, page 672. 
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17. Remarks and References. The methods of least squares and 
moments do not select the appropriate curve. They merely deter- 
mine the “ best ” values of the parameters in the equation of the 
curve which has been selected previously to describe the observed 
data. The question of the type of curve which should be fitted to the 
data is not always easy to answer. The selection of the appropriate 
mathematical function depends to a large extent upon the investiga- 
tor’s experience in the field in which the problem lies and his knowl- 
edge of the properties of curves. It always helps to plot the data first. 
The usual requirements for practical purposes are that (a) the curve 
must represent well the trend of the empirical data, and ( b ) the 
mathematical expression must not involve too many parameters and 
those present must be calculable from the data. In dealing with 
time series, if the objective is to find out what would happen if the 
percentage change should continue as it has on the average in the past, 
then an exponential trend is indicated. If the objective is to find out 
what would happen if the yearly (or monthly, etc.) change should 
continue as it has in the past, a straight line trend is indicated. 

We will merely mention here two other important curves which 
require more advanced mathematics in their treatment. The logistic , 
or so-called Reed-Pearl curve, is used extensively in studying various 
growth phenomena. Its function is of the form 

__ 1 
^ a + bc x 

and it resembles somewhat the Gompertz curve discussed above. 
For further discussion of this curve and methods of fitting it see 

1. Elements of Statistics — Davis and Nelson. 

2. Statistical Methods , Revised — Mills. 
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The function y = ks x g° x 

is known as Makeham’s law. It is used in actuarial work. The stu- 
dent having a working knowledge of the calculus will find an inter- 
esting discussion of its use in the field of insurance in an article en- 
titled Makeham’s Laws of Mortality , Rietz, American Mathematical 
Monthly, vol. 28, p. 471. 

The logistic curve was used in studies on the rate of growth of the 
population of the United States. But its usefulness in this connec- 
tion fell somewhat short, apparently, 1 of the claims of its sponsors. 
Two other references relating to the population of our country may 
appropriately be mentioned here. Although they do not involve 
problems of curve fitting they do afford instructive examples of the 
application of scientific method to social and political problems. 
They are 

1. Bibliography on Methods of Apportionment in Congress — E. V. Huntington. 
American Mathematical Monthly , vol. 49 (1942), pp. 115-117. 

2. Determination of the Center of Population in the United States. School 
Science and Mathematics , May and June, 1942. 

Exercises 

1. If the rate of change of y with respect to x is always proportional to the 

attained value of y then y is what kind of a function of x ? 

2 . Determine A and B in the best fitting curve of the type (13) for the following 

data. 


Data 

Form for Computations 

X 

y 

t 

Y 

tY 

t 2 

0 

1000 





5 

100 





10 

10 





15 

i 





20 

.i 






3. (a) Prove formula (11). 

( b ) Graph the curve y - lOe* 2 *. 

4 . Find the best fitting parabola for the following points: ( —4, 2), (0, 8), (4, 9), 

(8, 11), (12, 8), (16, 5). Ans. y = 7.2 + Mx - .07s 2 . 

1 Differential Equations Subject to Error , and Population Estimates — Harold 
Hotelling. Jour. Amer. StaL Assoc., vol. 22 (1927), pp. 283-314. 
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6. If the values of t form an arithmetic progression and = 0 prove that 

B 3 = o. 

6. (a) Add the values x = 30, y ~ 37 to the data of Example 6 and find the 

trend line. Am. y = .8x + 10.43. 

(b) On the hypothesis that the apparent trend continues, predict the value 
of y when x — 35. 

7. In a tensile test of a metal bar the following observations were made, where 

x represents the load in tons and y the elongation in ten-thousandths of 


an inch: 

X 

l 

2 

3 

4 

5 

y 

14 

27 

40 

55 

68 


Determine a linear relation between x and y by the theory of least squares. 

8. In the following table y represents the fire losses in the United States in 
millions of dollars. Taking the origin of x at 1915 find the best fitting 
line, in a least squares sense, for the data. 



9. (a) Add the values x - 6, y - 300 to the data of Example 7 (p. 153) and find 
the equation of the best fitting exponential curve. 

Ans. Y = .4617a; - .2534 
y = .56e l,06x . 

(6) Plot the given data and the curve obtained in (a) on semi-log paper. 

10. Distinguish between the forms of the curves represented by the functions 

y = Ae~ Bz and y — Ke' 1 ^ 2 where A , B, K, and h are positive real num- 
bers. If these functions were plotted on semi-log paper what kind of 
curves would be obtained? 

11. Determine by inspection the value of (a) 10 Iog io c , (&) a lo8 a N . 

12. Solve for x: logi 0 (z 2 ) « (logiox) (log e z). 

13 . Solve for x: logi 0 (z 2 ) — logioOr/10) = 2. 

14 . Determine a number x such that the square of log x exceeds log x by 2. 

(Logs to base 10. Two answers.) 

16 . On semi-logarithmic coordinate paper, a straight line is drawn through the 
points (2, 1) and (4, 100). Determine the function which has that line 
as its graph. Hint Use the form y ~ Ar*. Ans. 100 y — 10*. 

16 . Same as exercise 15 for the points (1, 6) and (2, 18). Ans. y = 2(3*). 
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17. On logarithmic coordinate paper, a straight line is drawn through the points 

(2, 12) and (3, 27). Determine the function which has that line as its 
graph. Am. y — Zx 2 . 

18. Data from a certain experiment involving voltage (v) as a function of time 

(t) are plotted on logarithmic coordinate paper, and are found to exhibit 
a linear trend there. A line is drawn, with a transparent ruler, which seems 
to fit the plotted data best. Two points on this line are (6, 18) and (8, 
32). Determine an equation expressing v in terms of t whose logarithmic 
graph is the line. 

19. Draw the graph of y — 25x n on logarithmic coordinate paper, (a) when n — 

2, ( b ) when n = —2. Mark scales clearly. 

20. The graph of y = logio x assists one in remembering several important 

properties of the logarithms of real numbers. Sketch this graph and 
state some of these properties. 

21. Read and report on one or more of the references cited in §17. 

Note . Source material for additional exercises on curve fitting 
may be found in the current volumes of the following publications: 

1. Statistical Abstract of the United States. 

2. World Almanac and Book of Facts. 



CHAPTER VIII 

CORRELATION THEORY 

1. The Meaning of Simple Correlation. So far we have been 
concerned with the problems which arise from variation in a single 
variable. We will now consider the simultaneous variation of two 
variables. Methods for disclosing the facts of co-variation and for 
measuring the degree of relationship existing between two variables 
are due mainly to the English biometricians Sir Francis Galton 
(1822-1911) and Karl Pearson (1857-1936). 

Data presenting two sets of related measurements or observations 
may arise in many fields of activity yielding N pairs of corresponding 
variates (xi, yi ), i = 1, 2, 3, • * *, N. Thus x may represent July rain- 
fall and y the average yield of corn in a certain section; x may be 
an index of commodity prices and y an index of employment over 
the same period; we may be interested in a group of school children 
in which x is their height and y their weight, or x may refer to their 
reading ability and y to their spelling ability; we may be studying the 
chance distributions which are obtained in throwing two dice where 
x is the number obtained in throws of a single die and y is the number 
obtained in throws of the two dice together. 

Example 1. In the following set of selected heights (inches), x — stature of 
father, y = stature of son. 


X 

69 

70 

69 

68 

70 

73 

69 

67 

69 

64 

y 

68 

69 

72 

67 

70 

71 

72 

66 

71 

65 


Example 2. ( Snedecor .) The following data on twelve trees are adapted from 

the results of an experiment to test the phenomenon that the injury by codling 
moth larvae seems to be greatest on apple trees bearing a small crop. Here 
x = hundreds of fruit on a tree, y — percentage of fruits wormy. 


X 

15 

15 

12 

26 

18 

12 

8 

38 

26 

19 

29 

22 

y 

52 

46 

38 

37 

37 

37 

34 

25 

22 

22 

20 

14 
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When the given pairs of values are represented by dots locating 
the points whose rectangular coordinates are (x, y) we obtain a so- 
called “ scatter diagram ” (Figure 33). The problem is to determine 
the degree of association, or correlation as it is called, between the 
x’s and the corresponding y } s since this indicates the significance of 
the relationship. 

The field of correlation may be thought of as bounded on the one 
extreme by perfect functional dependence and on the other extreme 
by complete independence in the probability sense. For example, 

the pairs of values which satisfy the 
equation y — 2x — 5 do not present 
a statistical problem. In this case the 
relationship is defined by a mathe- 
* ’ mat.ical function y = f(x). Similarly, 

‘ * at the other extreme we would not be 

concerned with pairs of values which 
p iG are completely independent in the 

probability sense, as, for example, the 
grades of students in statistics and the heights of their fathers. Two 
variables are said to be statistically related when they lie between 
these two extremes of relationship. 

The theory of correlation is concerned with a twofold problem: 
first with measuring the indicated relationship, and secondly with 
predicting or estimating the average value of y associated with a 
designated value of x. 

2. The Coefficient of Correlation. It is fairly obvious from Figure 
33 that with values of x in an assigned interval Ax ( Ax small) the 
corresponding values of y differ considerably. There is said to be 
positive correlation if , for an assigned x larger than x, the mean of the 
corresponding y values is larger than y , and, for values of x smaller 
than x, the mean of the corresponding values of y is less than y. 
On the other hand, as x increases the tendency may be for y to de- 
crease. In this case, for an assigned x larger than x the mean of the 
corresponding y values is less than y, and for an assigned x less than 
x the mean of the corresponding y’s is greater than y. There is then 
said to be negative correlation. If, for an assigned x taken at ran- 
dom a corresponding y is no more likely to be above than below y, the 
variables are independent in the statistical or probability sense and 
there is said to be zero correlation between them. 

When the variables are correlated there is a tendency for the dots 
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in the scatter diagram to fall into a sort of band having a fairly defi- 
nite trend. We are assuming that this trend is linear, and a theory 
built upon this assumption is known as simple or linear correlation. 

In Figure 34 the origin of the x'y' -axes is taken at (x, y). Then 
the points of the scatter diagram are distributed over the four quad- 
rants of the x'y'-plane. 



The coordinates of the points in the four quadrants have algebraic 
signs as follows. In quadrant 

I, x' and y' are positive; 

II, x' is negative and y' is positive; 

III, x' and y' are negative; 

IV, x f is positive and y' is negative. 

Therefore, the product x'y' is positive for all dots which occur in 
quadrants I and III and negative for all dots in quadrants II and IV. 
The algebraic sum of all such products describes the distribution of 
the dots over the quadrants. When this sum is positive the trend 
of the dots is through quadrants III and I, when it is negative the 
trend is through II and IV, and when zero there is no trend, the dots 
being equally distributed over the four quadrants in the sense that 
the positive products of x'y ' balance the negative products. Con- 
sequently, a natural measure of correlation would be obtained by 
summing the products x'y' for all the observed values and taking the 
average by dividing the result by N. Moreover, if we first express 
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x f and y f in units of their respective standard deviations we obtain a 
measure of correlation which is independent of the original units. 
This measure is universally denoted by r. Thus we have in symbols, 



It is variously called the total correlation , the product-moment co- 
efficient of correlation , and the correlation coefficient. 

We may give the following word definition: 

Definition. The correlation coefficierit of two sets of variates ex- 
pressed in their respective standard deviations as units, is the arith- 
metic mean of the products of deviations of corresponding values from 
their respective means. 

3. Other Formulas for r. Although formula (1) is very useful for 
giving the meaning of the correlation coefficient, other formulas 
easily obtained from (1) are usually much better adapted to numeri- 
cal computation. Since <r x and cr y are constants (1) may be written 
as 

~x)(y - y) 

(2 ) r = 

VxCTy 

It is useful to think of this as 


co-variance 


r = 


[(variance of #) (variance of y)] 1/2 
Formula (2) reduces to 


(3) 




n i 1/2 ri 


1/2 


It will be proved later that r cannot be larger than 1 nor less than 
- 1 . 
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Theorem I. The value of r is independent of the origin of reference 
and the units of measurement 
Proof: Let 


x — xo V — y o 

— - — , v = — 


Then 


u — _ > v — - 

h 

k 

x = 

= uh + Xo, 

y = vk + 2/o, <Tz = 

" h(T U , (J y ==: 

Substituting in (2) we obtain 


(4) 


jj Tl( u - u )( v 

-v) 


r — m 

( Ty.<J v 


(4a) 


^Zuv-uD 

rr iST v 


where 

'■■[s 

“11/2 

: — W 2 > cr v — 



Since (4) and (4a) are independent of the constants x 0 , y 0 , h, and k, 
the theorem is proved. 

This property is of fundamental importance. It means that the units of 
measurement for the two sets of observed quantities can be chosen indepen- 
dently of each other. If the two sets of quantities are of the same kind, the 
units need not be the same in both cases; and, what is more important, if the 
quantities are of different kinds, so that the units are not comparable at all, the 
coefficient r nevertheless may have a definite meaning. (Of course the value of 
the coefficient will be affected by a change in the method of measurement of one 
of the quantities, such as the substitution of an area for a length in estimating 
the size of an object, or the assignment of different relative weights to the ques- 
tions on an examination. ) 

The pairs ( Xi , y%) may be all distinct or there may be repetitions among them. 
But it is necessary to impose the condition that neither Xi nor yi shall be con- 
stant throughout. This condition is imposed to insure that the denominator 
shall not vanish in the various formulas for r. The Algebra of Correlation — 
Dunham Jackson. American Mathematical Monthly , vol. 31 (1924), pp. 110- 
121 . 

When the given values of x and y are large and a computing machine 
is not available, the computations may be lightened by an appropriate 
choice of these constants. If only the origin of reference is changed, 
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then h = k - 1, and u = x — x^, v = y — yt>. If the means are taken 
as the origin of reference by letting x' = x — x and y' = y — y, 
then x’ = y' = 0 and the formula becomes, 


(5) 


N 


I>y 


r = 


r i *~i 1/2 1 i *ii/2 

jv &-*j 


A subscript notation should be attached to r when there are several 
series of variates. Thus, r xy for the (x, y) series, r xs for the (x, z) 
series, r n for the series denoted by (xi, x 2 ), etc. 

Example 3. To illustrate the formulas we will compute the value of r for the 
following data. Here x = Brokers’ Loans in billions of dollars and y = The 
Annalist's index of the prices of fifty rail and industrial stocks in 1929. We choose 
u = x — 5.00 and v — y — 250. 


Month 

X 

y 

u 

V 

uv 

Ur 

V 2 

J 

5.33 

248 

.33 

■ 

-2 

-0.66 

.1089 

4 

F 

5.67 

248 

.67 

-2 

-1.34 

.4489,“ 

4 

M 

5.65 

243 

.65 

-7 

-4.55 

.4225 

49 

A 

5.56 

249 

.56 

-1 

-.56 

.3136 

1 

M 

5.53 

235 

.53 

-15 

-7.95 

.2809 

225 

J 

5.28 

265 

.28 

15 

4.20 

.0784 

225 

J 

5.77 

282 

.77 

32 

24.64 

.5929 

1024 

A 

6.02 

303 

1.02 

53 

54.06 

1.0404 

2809 

8 

6.35 

1 290 

1.35 

40 

54.00 

1.8225 

1600 

0 

6,80 

! 230 

1.80 

-20 

-36.00 

3.2400 

400 

N 

4.88 

I 201 

— .12 

-49 

5.88 

.0144 

.2401 

D 

3.45 

206 

-1.55 

-44 

68.20 

2.4025 

1936 

Sums 


6.29 

0 

159.92 

10.7659 

10678 

7 T Sums 

N 


.5242 

0 



889.8333 


Computations: <r u = [.8972 — (.5242) 2 ] l/2 = .79 

Of) = [889.8333] l/2 = 29.83. 

From (4a) we have, 

13.3267 

t ~ — “ .57. 

(29.83)(.79) 
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Experienced computers use calculating machines to great advan- 
tage in large-scale computational studies. The following reference is 
recommended to students who expect to engage in such work: “The 
Calculation of Correlation Coefficients from Ungrouped Data” — 
P. S. Dwyer, Journal of the American Statistical Association , vol. 35 
(1940), pp. 671-673. 

Exercises 

1. When x' and y r represent deviations from the means, 

(a) Show from (1 ) that X)* V = Nr<r z <r y . 

(b) Show that N<j x 2 = X] 3 ^ 2 * 

2. Derive formula (3) from (2). 

3. Show that (3) may be written as 

ivXsy - 

r " [{2V][> - (I>) 2 } {NHv 1 ~ CL2/) 2 }] 1/2 " 

4. Find r for the data of Example 1. 

6. Find r for the data of Example 2. 

6. The following data represent the ages of husband (x) and wife (y) of twenty- 
couples. Find r using (5). Ans. 0.856. 


X 

22 

24 

26 

26 

27 

27 

28 

28 

29 

30 

30 

30 

1 

31 

32 

33 

34 

CO 

35 

36 

1 CO 

y 

18 

20 

20 

24 

22 

24 

27 

24 

21 

25 

29 

32 

27 

27 

30 

27 

30 

31 

30 

32 


7. In studying a set of pairs of related variates, a statistician has completed the 

preliminary arithmetic and obtained the following results: 

N = 100; = 1,585,000; = 12,500; Y.xy = 1,007,425; T.V 2 = 

648,100; 51 \y - 8,000. Find x, y, a x , <j v , r. 

8. The table in Exercise 2, page 97, contains the grades made on two tests by 

twenty-five students in mathematics. Find r for these data. Ans, 0.786. 

9. Suggest examples of negative correlation. 

10. In the following anthropometric measurements on a random sample of 
twenty male freshmen, taken from the Physical Education Department, 


X 

y 

z 

X 

y 

z 

68.5 

33.6 

148 

65.3 

33.0 

136 

67.2 

35.0 

144 

65.1 

34.0 

144 

67.7 

30.2 

145 

64.8 

37.3 

170 

63.8 

30.0 

108 

69.6 

33.4 

154 

69.9 

33.0 

130 

68.2 

31.5 

122 

64.7 

31.0 

112 

68.8 

32.0 

141 

68.4 

33.0 

134 

72.3 

35.0 

159 

66.4 

30.2 

112 

67.8 

33.7 

134 

69.1 

33.3 

143 

71.3 

31.5 

136 

71.0 

32.3 

136 

63.5 

33.6 

126 
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x represents height, y represents chest measurement, both measurements 
being taken to the nearest tenth of an inch, and z represents weight to the 
nearest pound. Find the coefficient of correlation (a) between x and y, 
(b) between x and z, (c) between y and z. 


4. Regression. The properties of r can be studied by fitting a 
line to the scatter diagram in such a way as to make the sum of the 
squares of the vertical distances from the points to the line a mini- 
mum. 

When such a line is referred to the point (x, y ) as origin, we have 
seen (§9, Chapter VII) that its equation is y' = mix' where 


mi = 


i yi 

i >.' 2 


and x' = x — x, y' = y — y. This value of mi may easily be ex- 
pressed in terms of r and the standard deviations, as follows: 


mi 


Nrcr y (T x <y y 

7 T 7 X ‘ 


Therefore, the equation of our line, referred to a system of axes whose 
origin is at the means of the variates, is 


(6) y' = - rx'. 

CTx 

This is called the regression line of y on x. The term regression 
was used first by Galton in studying inheritance of stature. He 
found that offspring of abnormally tall or short parents tend to 
“ step back ” or “ regress ” to the ordinary population height. 
However, as now used, regression line has no reference to biometry, 
but is merely a convenient term. 

By fitting a line x f — m^y' to the points of the scatter diagram in 
such a way that the sum of the squares of the horizontal distances 
from the points to the line shall be a minimum, it is possible to de- 
duce a second regression line (the regression line of x on y) whose 
equation, referred to (i x , y) } is 

(7) x' = —ry’. 

C y 

Note that (7) cannot be obtained by solving for x' in (6). The 
two regression lines will coincide if, and only if, r - ±1. From 
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the equations of the regression lines it is evident that if r > 0, an 
increase in the one variable tends to accompany an increase in the 
other; if r < 0, an increase in the one will be accompanied by a 
decrease in the other. 

Equations (6) and (7) are usually expressed in terms of the 
original variables x and y instead of the deviations x ' and y f . It is 
obvious that they may be written as 

(8) y-y = r—(x-x) 

c Tx 

and 

(9) x — x = r — (y — p) 

<Xy 


when referred to the origin of x and y . 

Equation (8) may be used to estimate values of y corresponding 
to designated values of x. Similarly, from equation (9) we may 
estimate a; for designated values of y. It would be appropriate to 
use (8) as a predicting equation when the variation in y is caused or 
controlled by the variation in x; (9) would be used when the varia- 
tion in x is caused or controlled by the variation in y. 

The quantity mi = r(o- y /a x ) is called the regression coefficient of y 
on x, being the variation in y corresponding to a unit change in' x. 
Likewise, m 2 — r(<r x /<x y ) is called the regression coefficient of x on y. 
Thus the numerical value of r is given by (mim 2 ) 1/2 but its sign must 
be that which is common to the two regression coefficients. The fol- 
lowing quotation from Snedecor (reference 13, list p. 6) sheds light 
on the distinction between regression and correlation. 

The point of interest here is that r is the geometric mean of the two regression 
coefficients. In ordinary units of measurement, therefore, r is an average of the 
two regression coefficients used in (i) estimating y from x and (ii) estimating 
x from y. This serves to clarify the relation of the two coefficients, correlation 
and regression, in measuring relationship. The latter is the appropriate one if 
one variable, y, may be designated as dependent on the other, x. Values of y 
may be partly controlled or caused by x, as when the available amounts of some 
glandular secretion cause differences in the sizes of organisms. Or, y may be 
subsequent to x } as weight gain in nutrition experiments follows the measurement 
of initial weight. In such cases, the regression of y on x is usually the statistic 
that furnishes the information desired. It is then appropriate to attempt to 
estimate the value of y from a knowledge of the corresponding value of x. Cor- 
relation, on the other hand, is the appropriate measure of the relation between 
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two variates like statures of husband and wife. The two heights are known to 
be associated through some complex of social and biological causes, but neither 
may be looked upon as a consequence of the other. In this sense correlation 
is a two-way average of relationship, 
while regression is directional. Of course, 
there are many variables whose relation* 
ship may be studied by means of either 
correlation or regression, or both. It is 
necessary only to keep clearly in mind 
the character of the relation being con- 
sidered. 

Geometrically, mi is the slope of line (8) and l/m 2 is the slope of 
line (9). The two lines intersect at (x, y). 



Exercises 

1. Derive the equation of the line of regression of x on y as suggested above. 

2. Find the equations of both lines of regression for Exercise 6 (page 176), and 

plot them. Ans. y — .888z — .64 
z = .825 y + 8.55. 

3. Using the appropriate equation, find the estimated values of y corresponding 

to the given values of x, for Exercise 6 (page 176). 

4. Given the following results for the heights and weights of 1000 men students: 

y — 68.00 in., x — 150.00 lbs., r — .60, 

<j y = 2.50 in., cr x = 20.00 lbs. 

John Doe weighs 200 lbs., Richard Roe is five feet tall. 

Estimate the height of Doe from his weight, and the weight of Roe from 
his height. 

Ans. Doe’s height = 71.75 in. 

Roe’s weight = 111.6 lbs. 

5. (a) Given the following: 

2> = 150,000, X> 2 = 22,725,000, = 10,522,500, 

= 70,000, = 4,936,000, N = 1000. 

Find x , y , a x , <r yi r, and the lines of regression. 

(6) Suppose the data in (a) refer to the weight in pounds ( x ) and the height 
in inches (y) of a sample of 1000 policemen. Suppose Paul Private weighs 
160 pounds and Saul Sergeant is 6 feet tall. Estimate the height of 
Private and the weight of Sergeant. 


5. The Standard Error of Estimate. The average concentration 
of the points around the regression line of y on x may be measured 
1 

by the expression — / .d? where d is the difference between an ob- 
served y and the y obtained from the regression line. The value of 
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N 


will be denoted by S y 2 , and S y is called the standard deviation 


of the errors of estimate, or more briefly the standard error of estimate. 
The errors of estimate are the deviations of the observed values of y 
from the corresponding estimated y’ s. Or to describe them another 
way, they are the deviations of the sample y J s from the assumed 
population y’ s. It can be shown that S y 2 = a y 2 (l — r 2 ). To prove 
this we may write the sum of the squares of the deviations in the 
form: 


NSy 2 = 



■?*)’ -a 


,/2 


2r ^Y l x'y' + r^X^ 

<J X <? X 


= Nay 2 - 2NrW + NrW = Na y \ 1 - r 2 ). 


Hence, we have 

(10) S y 2 = cr,/(l - r 2 ) 

and 

(10a) * S y = <r„(l - r 2 Y 12 . 

An analogous consideration of the differences between the x’s and 
the regression line of x on y gives for the square of the standard 
error of estimate of the z’s 


(11) Sx 2 = *x 2 ( 1 - r 2 ). 

6. Properties of the Correlation Coefficient and Standard Error 
of Estimate. Certain properties of r may now be deduced. It is 
obvious from (10) that \r\ < 1 because both the left member and 
cr y 2 are positive or zero. Therefore, 

-I < r < 1. 

If the points all lie exactly on the regression line, the left member of * 
(10) vanishes and r = dbl. There is then said to be perfect linear 
correlation, since the relation between x and y is given exactly by a 
linear function. A large numerical value of r means that the regres- 
sion lines are close to coincidence and the points in a scatter diagram 
cluster closely around the regression lines. 

When the regression lines (8) and (9) are expressed in standard 
units, they become respectively 

(12) t y = rt x 
and 
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(13) t x = rt y or t u — - t x 

where 

x — x , y — v 

t x — and ^ = * 

O’ a; (Ty 

In this form we see at once that as one variable t x increases, the other 
variable t y increases (or decreases) to an extent that depends upon r. 
Thus r measures co-variation in the variables when they are ex- 
pressed in comparable units and when regression is linear. 

In standard units, r is the slope of line (12) and 1/r is the slope of 
line (13). When r — 0, the regression equations become t y — 0 and 
t x = 0 in standard units or y = y and 
x — x in the original units. These are 
also the equations of the coordinate 
axes. Therefore, when r = 0 the re- 
gression lines are perpendicular to each 
other and coincide with the t x and t y 
axes. When r = 1 the regression equa- 
tions become identical and the two lines 
coincide in quadrants I and III. Simi- 
larly, when r = — 1 they coincide in quadrants II and IY. In each 
case the coincident lines bisect the quadrants if the equations are 
expressed in standard units, but not otherwise unless a y = <r x . The 
angle 8 between the regression lines varies from 0° to 90° as r varies 
from one to zero. 

When there is no correlation between x and y then r = 0, and the 
variables are said to be independent in the statistical sense. On the 
other hand, when r = 0, it is not necessarily true that the variables 
are statistically independent. Indeed there may be a high correla- 
tion 1 with non-linear regression when r = 0. (Non-linear regression 
will be considered in §21.) Incidentally, the phrase “independent 
variables ” in the statistical sense should not be confused with the 
phrase “ independent variables ” which is used in the ordinary sense 
of analysis to designate the variables on which a specified function 
depends. However, the two usages, though quite distinct, are not 
fundamentally contradictory, since functional dependence can be 
regarded as a limiting case of statistical dependence. 

1 See H. L. Bietz, On Functional Relations for which the Coefficient of Correlation 
is Zero. Journal American Statistical Association , vol. 16, 1919, pp. 472-476. 
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For an appreciation of the use of S y in passing judgment upon the 
precision to be expected in estimating values of y by means of the 
regression equation of y on x, it is instructive to consider simulta- 
neously the meanings of (8) and (10a) as \r\ varies from 0 to 1. When 
r — 0, (8) becomes y ~ y which means that the best estimate of y 
for any value of x is the mean of the ^-distribution. In other words, 
knowledge of x is of no value in predicting y . When r — 0 in ( 10 a), 
S v = <r y . This is to be expected since the dispersion S y about the 
line y — y is the same as the dispersion a y of the given y’s about their 
mean. But as \r\ increases from 0 to 1 , S y decreases from a y to 0 . 
Graphically, the meaning of this improvement in S y in comparison 



Fig. 36 — For a Fixed Value of <r v , S v Decreases in Proportion 
to (1 — r 2 ) 1/2 as r Increases 

with <r y , as r increases, is shown in Figure 36 where parallel lines are 
drawn at a vertical distance of S y on either side of the regression line 
RR'. For a given value of |r| ^ 0 this strip encloses the average dis- 
persion about the line. The strip on either side of y = y at a dis- 
tance of G y from it encloses the average dispersion about the line when 
r = 0 . As \r\ increases from 0 , the line rotates from the horizontal 
position of y = y to the terminal position it would have when \r\ ~ 1, 
and at the same time S y decreases toward 0. Formula (10a) tells 
us that as \r\ thus increases, S y decreases from <r y in proportion to 

(1 - r 2)l/2 # 

A similar analysis could be made concerning the line of regression 
( 9 ) of x on 24 which rotates from the vertical position x = x when 
|r| = 0 to meet and coincide with line (8) when |r| = 1. As line ( 9 ) 
rotates, S x decreases from a x to 0 in proportion to (1 — r 2 ) 1/2 as 
r increases. 
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As \r\ — > 1, (12) and (13) rotate toward each other at equal angular 
velocities. When they are coincident their slope is dbl. Lines (8) 
and (9) rotate at angular velocities which 
are proportional to = tan a and m 2 = 
tan f$, respectively, when and m 2 are 
defined in §4. Their slope at coincidence 
is dzo^/cr*. For line (12) it can be shown 
that 

(14) £l>-l -r* 

where 5 is the difference between an ob- 
served value of t y and the ordinate ob- 
tained from (12) for the corresponding value of Thus, 

£&-*.)* 

— Jj ~ Jf Htxty + — 'JT'tz 3 

= 1 — 2r 2 + r 2 
= 1 — r 2 . 



This result would also be apparent from the derivation of (10) since 
d = d/cry where d refers to residuals in units other than standard 
units. 

1 

It is obvious from (14) that the maximum value of — 2^S 2 is unity. 
Therefore, adopting 
(15) 1 - ^ 2> 


as a measure of goodness of fit, we see from (14) and (15) that r 2 
is a measure of the goodness of fit of (12) to the points of the scatter 
diagram expressed in standard units. By an analogous argument a 
similar conclusion concerning (13) can be made. 

7. Further Discussion. Given a set of N pairs of x and y cor- 
related values. Suppose the necessary constants are evaluated to 
obtain the regression equation (8). Then if the given values of x 
are substituted in this equation, a set of estimated y% say E y, will be 
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obtained. The mean, ey, of these estimated y’s is the same as the 
mean of the observed y’s. The proof is as follows. From (8) we have 

Ey = y + r — (x - x). 

Cx 

Then 

N 

But ~ = 0 by Theorem VI, Chapter III. So E y = y. 

i * 

We now state the following theorem. 

Theorem II. The variance , <r Ev 2 , of the estimated y's equals r 2 cr y 2 . 
Proof: By definition, 

(TEy 2 = — EV) 2 . 

From the above discussion, { E yi — E y) is the same as (y* — y) which 
is given by (8). So 

CEy2 = N^r\_ r f x {Xi ~ z)1 \ 

rr 2 1 N 

Hence 

(16) c r Ey 2 = r 2 a y 2 . 

From this theorem and (10) we obtain 

(17) S y 2 = c r y 2 - cr Ey \ 

This relation helps to clarify the meaning of r and of S y . It is con- 
ventional to call <T Ey 2 the variance in y which can be explained from 
knowledge of x; that is, which the regression of y on x accounts for. 
(In the language of some writers, a Ey 2 measures the variation of 
regression about the mean.) Therefore, (17) shows that S y 2 is the 
variation in y after the accompanying variation in x is duly dis- 
counted. Sy 2 is sometimes called the residual variance because it 
measures the variation in the dependent variable y which knowledge 
of x fails to account for. This relation can be depicted geometri- 
cally by the sides of a right triangle. To standardize the representa- 
tion we can take cr y = 1 as the diameter of a semicircle within which 
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is inscribed the right triangle, as in Figure 37. In the figure, cos 6 = 
<rEy/<ry So from (16) we have cos 9 = r. The particular values of 
6 in the figure, found from a table of cosines, are 6 = 36°’ 52' when 
r = .8, and 0 = 25° 50' when r = .9. When r = 1, then = a y 
and the regression of y on x accounts for all the variation in y. 




Theorem III. The correlation between observed and estimated values 
of y is the same as that between the observed values of x and y. 

Proof: We are to show that 

^ UveV - l hy 

a EyCF y 

reduces to one of the formulas for r. Substituting the values for 
eV, eV, <rEy into the above expression and simplifying, we obtain (3). 
The details of the proof are left to the student as an exercise. 

8. Coefficient of Alienation. A measure of the failure to improve 
estimates of y from knowledge of correlation is given by 

(18) k = (1 - r 2 ) 1/2 . 

It is sometimes called the coefficient of alienation. Incidentally, it 
is interesting to observe that the functional relation between k and r 
is shown, graphically, by a semicircle of unit radius, i.e., 

f(r ) = (1 - r 2 ) 1 ' 2 . 

The formula 

V = 1 - (1 - r 2 ) 1/2 

may be called the improvement factor because it shows the decrease 
in Sy/<jy as \r\ increases. It is clear that 

** = h and V = 1 - k. 

N CF y 
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Table 31 gives 1 values of k and k' for values of r. With no knowl- 
edge of correlation, the best estimate of an individual y is y. Values 
of ) V for assigned r’s show how much better than this guess is the 
es tima te of an individual y value with knowledge of correlation. 
For example, when r = .5 the column headed k in Table 31 shows 
that the standard error S v is about 87% of <r y . Or, from the k' 
column, S y has been reduced only 13% from what it would have 
been if y had been used for prediction purposes. The third column 
thus shows how the prediction value of r varies with r. Thus as |r| 
decreases from 1 to .8, S y /a y increases from 0 to 60%. Or from 
another point of view, as |rj increases from 0 to .8, the error of 
estimate is improved by only 40%. A correlation of r — .9 permits 
prediction of individual y’s only 56% better than a mere guess based 
on the mean. 

It is fairly obvious that we cannot, with any considerable degree 
of reliability, predict from ordinary values of r an individual y for an 
assigned x. However, with a large N, we can give a very reliable 
prediction of the mean of y values that correspond to an assigned 
value of x. This can best be explained from a correlation table 
which is used when N is large and which will be explained in the 
next section. 


Table 31 — Values of r and the Corresponding Values of k and k ’ 


r 

k 

k ' 

.1 

.995 


.2 

.980 


.3 

.954 

.046 

A 

.917 

.083 

.5 

.866 

.134 

.6 


.200 

.7 

.714 

.286 

.8 



.9 

.436 

.564 

.92 

.392 

.608 

.94 

.341 

.659 

.96 


.720 

.98 

.198 

.811 

1.00 




1 Constructed from a table of sines and cosines. Letting r = cos 0, sin 0 = 
(1 - cos 2 = (1 - r 2 ) 1/2 . 
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Exercises 

1. Given the following correlated data: 


X 

8 

6 

4 

7 

5 

y 

9 

8 

5 

6 

2 


(a) Compute the correlation coefficient. 

(i b ) Find the regression line of y on x. 

(c) Find the estimated values of y corresponding to the given values of x. 

(d) Compute the standard error S y of predictions in two different ways. 
Ans. 

r = — 7 ~r— = .69, mi = = 1.2, S tt = Vili = 1.76. 

V2V6 V2 

Note. In practical work, it is never worth while calculating a correla- 
tion coefficient for so few observations. These fictitious data are given 
solely as an exercise on which the student can test his knowledge of the 
methodology. 

2. Prove that the ratio of variance of the estimated y 1 s (taken about their 

mean) to the variance oy 2 of the given y’s is equal to r 2 . 

3. If S y 2 /<r v 2 — 1 — r 2 is the percentage of the total variance of y Uncontrolled 

by knowledge of x, what is the remaining percentage, determined by or 
calculable from knowledge of x? 

4. What equation is the equivalent mathematical statement for the following 

words? 

If the respective deviations in each series, x and y, from their means 
were expressed in units of standard deviations — that is, if each were 
divided by the standard deviation of the series to which it belongs — and 
plotted to a scale of standard deviations, the slope of a straight line best 
describing the plotted points would be the correlation coefficient r. 

5. Given the standard deviations <r x and <r y of two distributions of correlated 

variates: 

(a) What is the standard error in estimating y from x if r = 0? 

(i b ) By how much is S v in (a) reduced if r is increased to .25? 

(c) How large must r be in order that S y be one-half as large as in (a)? 

(d) What must r be in order that S y be reduced to one-third its value in (a)? 

(e) At what value of r is S y reduced to zero? 

(/) For any value of r, what is the ratio between the standard error of 
estimating y from x and the standard deviation of the y-distribution? 

6. Evaluate the following statements: 

(a) A correlation coefficient less than zero indicates an absence of linear 
relationship. 

(5) A correlation coefficient of r = .6 indicates twice as close relationship 
as a coefficient of r = .3. 
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7. If all the points lie exactly on the regression line of y on x , show that S y 2 — 0 

and hence that r = ±1. 

8. Show that S y 2 may be computed by means of the relation 


ns v * = Zv' 2 - 


(ZxY)? 

Z x ' 2 


9. 


where the primes denote deviations from the means. 

(For analytics students.) Show that the tangent of the angle from line (8) 
to line (9) is 


and from line 


tan d = ■ 


<r x&y 


Vx 2 + (Ty 2 

(12) to line (13) is 



1 - r 2 

tan 6 = 

2 r 


What is the Value of 6 when r — 1; when r — 0? 

10. The least-squares criterion of best fit requires that be a minimum, 
where 6 is the distance between the line and a point. Three cases arise 
depending on whether 
Case I, 5 is measured parallel to the y-axis, 

Case 77, 5 is measured parallel to the a>axis, 

Case III, 8 is measured perpendicular to the line. 

We have seen that Case I yields line (12) and that Case II yields line (13). 
In Case III the line has no universally accepted name but it may be called 
the “ geometrically best-fitting line.” 

(For calculus students.) For Case III prove the following: 

(a) In standard units, the equation of the line is 

t y = t x if r > 0 
and t y = —t x if r < 0. 

Solution. Let the equation of the required line be 
t y = mt x + h. 

Then by analytics, 


1 V s ro ^ V ( k ty\ 2 

vT+rf ) 

_ m 2 4- & 2 + 1 — 2 mr 
1 + m 2 

To make this a minimum, first put Jc 2 = 0. Call the result f(m). Then 


f(m) 

/'(*) 

/"(«) 


m 2 j- 1 — 2mr 
1 + rn 2 9 
2m 2 r - 2r 
(1 + m 2 ) 2 ’ 
4w(3 — m 2 ) 

(1 + m 2 ) 2 
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The second derivative will be positive when m and r have the same sign. 
Since f(m) is a minimum when m = ±1, we are to take m = 1 when 
r > 0 and m — — 1 when r < 0. 

(b) If r = 0, all lines (for which k 2 ~ 0) fit equally well. Hint. If r = 0, 

/(m) = 1. 

(c) — y^.5 2 = 1 — |r[. What is the value of f(m) when m = ±1? 

Note that \r\ — +r, if r > 0 and |r| = — r, if r < 0. 

(d) Goodness of fit is measured by |r|. 

(e) When r = .6 the fit is twice as good as when r = .3. 

11. The following query and answer appeared in Biometrics Bulletin , vol. 1, 
no. 3, pp. 36-37. “ Research ” assignment: Investigate the references 

cited in the answer and justify the procedure which is recommended 

' (under the given hypothesis). 

Query. A problem that has bothered me is the fitting of regression 
lines when their position is restricted in some way. For example, suppose 
a test is made of the relationship between the number of fish caught in a 
body of w T ater and the average number which can be caught out of it, with 
a standard amount of fishing. In fitting a regression line to such data, 
we know that the point (0, 0) must fall on the line, since if no fish are 
present certainly none will be caught. In other words, we have one 
point which is free from sampling error. The unique importance of this 
point will, it seems to me, make observations in its neighborhood of rela- 
tively less importance than observations at a distance from it, where 
there is no fixed guide-post. Do you know of any treatment of situa- 
tions of this sort, by which the best straight (or curved) line could be 
fitted to data where there is one point which must be satisfied? The 
standard deviation from regression (“standard error of estimate”) and 
the standard error of the regression w r ould also be available. Or are these 
concepts pertinent in such a question? 

Answer. Deming (§15 and §11 of reference 4) gives both a general 
method and some particular solutions of your problem. Snedecor (refer- 
ence 6) opens his Chapter 6 with an illustration of the simple case in 
which x is measured without error and the variance of y is constant for 
all values of x. 

Observations in the neighborhood of (0, 0) may or may not be of less 
importance than those at greater distances; it depends on the variance 
of y. One often finds that this variance increases with x. In fact, there 
are many situations in which it seems reasonable to suppose that in the 
sampled population the standard deviation of y is directly proportional 
to x. If you think this hypothesis is suitable in your fishing, the appro- 
priate method is to calculate the ratios x/y where x is the number of fish 
caught and y is the total number of fish, then apply to them the statisti- 
cal procedure suitable for a single variate. — George W. Snedecor. 

9. Correlation Table. When the sample to be studied is large, 
it is more convenient to replace the scatter diagram by a correlation 
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table. We may divide the a^-plane into rectangles of convenient 
size, and all points of the scatter diagram falling within any rectangle 
are thought of as being concentrated at the center of this rectangle. 
A number is then written within the rectangle to designate the 
number of points at its center. A correlation table is therefore a 
two-way frequency table exhibiting the frequencies in each class 
interval. 


Table 32 




65 - 
69 

70 - 

74 

Ol 
VO J 

80 - 

84 

85 - 

89 

90 - 

94 

95 - 

99 



X 

y\ 

67 

72 

77 

82 

87 

92 

97 

fiv) 

90-94 

92 




1 

2 

3 

1 

7 

85-89 

87 



1 

3 

8 

1 

5 

18 

80-84 

82 

4 

4 

6 

4 

9 

1 


28 

75-79 

77 

3 

3 

7 

6 

4 



23 

70-74 

72 

2 

3 

5 

6 

1 

1 


18 

65-69 

67 

3 

2 






5 

60-64 

62 

1 







1 


fix) 

13 

12 

19 

20 

24 

6 

6 

100 


Suppose Table 32 is constructed in this way for a set of average 
daily grades (x) and final examination grades (y) of 100 students. 
When the data have been thus grouped into classes, the class marks 
are regarded as the variate values. Thus in Table 32 there are 9 
students whose daily grades are 87 and whose final examination grades 
are 82. The last column labeled f(y) represents the distribution 
of y variates and the last row labeled /(x) represents the distribution 
of x variates. A correlation table is thus a bivariate distribution. 
In Table 32 the width of the class interval is the same for x and ?/, 
but of course this is not generally the case. 

10. Notation. In order to compute r from a correlation table it 
will be necessary to develop new notation. Since we are now dealing 
with frequencies in both the x-direction and the ^/-direction, we will 
distinguish between them by /(x) and f(y). To be sure, this has 
the disadvantage of being the same symbol as that for function, but 
from the context no ambiguity should arise. 



Sec. 10 


Notation 


191 


Generalizing, a correlation table is of the following form: 


x 


\x 

y\ 

x i 



- 

— 

— 

— 

x n 

f{y ) 

,y) 

x y 

Vn 










Vn-1 




f(x,y 

) 




S/(a 

X 

1 










— i 





(x,y) 





J 










Vi 











/(») 



■t 

* 

% 

i 1 

7 ) 





~nZ 

_!_J 


zz /(*.») 

V * 


The rectangles containing the frequencies are called cells. The 
frequency in a typical cell is denoted by f(x, y), meaning the frequency 
in the cell whose coordinates are x and y, where x and y are the 
mid-values of the class intervals. Both columns and rows are sub- 
distributions of the total frequency N. Each column is a frequency 
distribution of y’s corresponding to a mid-rc value. Similarly, each 
row is a frequency distribution corresponding to a mid-y value. 
The sum along any row is denoted by Z/fa y), being the sum of 

X 

the frequencies in the (x } y) cells in the n-direction. Since the 
marginal total for any row is the total frequency corresponding to 
a given value of y , it is therefore written in the column headed f(y). 
Thus, in Table 32, for y == 92, 

Z/fo y) = Z/fo 92 ) = 1 + 2 + 3 + 1 = 7. 

X X 

Similarly, Z/fo y) denotes a summation in the y-direetion of all the 
y 

entries in a column, corresponding to a fixed value of x , so it denotes 
an entry in the bottom row which contains the f(x ) frequencies. 
Thus, for x = 67 

yi.f(67, y) = 4 + 3 + 2 + 3 + 1 = 13. 

y 
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Summarizing, 

(19) EMy) =f(y); Z/fo v) = /(*)• 

x v 

With regard to N, we may obtain it from a correlation table in 
three ways: (1) by adding the entries across the rows and then 
totaling the resulting sums in the marginal column labeled f(y) ; 
(2) by adding the entries along the columns and then totaling the 
results in the marginal row labeled f(x); (3) by adding the entries 
in the cells in any order whatsoever. Hence, the following notation, 

( 2 °) ZZ/fo y ) = EZ/fo y ) = Z/fo y) = n, 

y x x y x,y 

will denote, respectively, the above-named procedures or orders in 
summing. From (19) and (20) we have 

(21) n = Z f(y) = E/(») = Z/fe v )• 

y x %,y 


We may call f(x) and f(y) the marginal distributions of x and y , 
respectively. A correlation table with cell frequencies f(x, y) 
uniquely determines the marginal totals f(x) and f(y). The con- 
verse, however, is false. For example, we might replace the four 
cell frequencies in the upper right-hand corner of Table 32 by the cell 


frequencies 


2 2 
24 


without disturbing the marginal totals. 


11. Means and Variances. We will now express the means in 
terms of this notation, taking first the mean of x’s. From the funda- 
mental definition, we must multiply each x by its corresponding 
frequency in the cells and sum the results, taking the products in any 
order whatsoever. Hence, 


s = h Z^/fo y )• 

x,vx 


This 1 may also be written 


5 = j^ZZ^/fo y) = ^Z^Z/fe y) = 


x y 


N - 


N - 


Observe that the x may be moved to the left of 23 second 

y 

expression because x is treated as a constant in a summation per- 
formed with respect to y. 
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Similarly, we have, 

y = jj Evffr, v) = t; EEvfi*, y) 

iV a;, 2/ M y X 

= ^YivE.Kx, y) = ^Evf{y). 

The student will observe that the last expression for the mean in each 
case is identical with that given for a frequency distribution of one 
variable, when allowance is made for the necessity of distinguishing 
between variables. 

Any column is an x array of y* s, so the symbol y x is appropriate 
for the mean of a column. Similarly, x y denotes the mean of a y 
array of z’s, i.e. f of a row. We may now state the following theorem . 1 

Theorem IV. The mean y for the whole table {in the y-direction) 
is equal to the mean of the values y x for the several columns when each y x 
is weighted with the frequency in that column. 

Proof: We are required to show that 

-jjEf( x )y* = y 

where 

Upon substituting in the first equation the value of y x as given by the 
second equation, we have 

f r EEyf( x > y) = h Eyf( x > y) = y- 

M x y x,y 

It is suggested that the student state and prove a similar theorem 
concerning x. 

In this new notation, the definitions of the variances becomes 

o-x 2 = -kE( x - x Vf( x >y) 

x , y 

= |l> 2 /(*) - x2 ; 

1 This is actually the same as Theorem IX on page 45, but it seems worth- 
while to state and prove it in the new notation. 
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«s 2 = t;I Z(y - y) 2 f( x , y) 

™ x, y 

= ^I Zv*f(y) - y~- 


Exercises 

1. Evaluate the following expressions in Table 32. 
(a) For x = 82, 


y 

87, 

Zv/fo 2/)> 

y 

/(«), 

y •- 

E/(*, 2/). 

I >/fo 2 /). 

f(y)> 

Xy* 


X X 


2. Refer to Table 27 (Chapter V) and let x be the number of a column. Express 
the answers in the third and second lines from the bottom of the table in 
terms of the notation of this section. Thus for z = 1, 


y* = -?-r I >/(*, y ) = J [85 + (75)2 + (65)2 + (55)2] = 67.86. 

f\X) y 7 

12. Computation of Means. Just as in the case of a one-way 
frequency distribution it was found convenient to choose an arbi- 
trary origin and take the class interval as the unit, so we now do 
likewise. Let 

( 22 ) u = i (x — Xq ); i.e x = uh + x 0 . 

Hence, 

(23) x = uh + x 0 
where 

u = -^'Euf(u). 

Likewise, let 

(24) v = 7 (y - y 0 ); i.e., y « vk + 2 / 0 , 

K 

whence 

(25) y = $$ + 2 / 0 , 


where 
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Then a suitable form for computing the means of the z’s and y’s 
is as follows: 



u 

— 3 

*~2 

-1 

0 

1 

2 

3 

fiv) 

vf(v) 

V 


67 

72 

77 

82 

87 

92 

97 

f(v) 

3 

92 




1 

2 

3 

1 

7 

21 

2 

87 



1 

3 

8 

1 

5 

18 

36 

X 

82 

4 

4 

6 

4 

9 

1 


28 

28 

0 

77 

3 

3 ! 

7 

6 

4 



23 

0 

-1 

72 

2 

3 

5 

6 

1 

1 


18 

-18 

-2 

67 

3 

2 






5 

-10 

-3 

62 

1 







1 

-3 

/(*> 

=/(«) 

13 

12 

19 

20 

24 

6 

6 

100 

54 

uf(u) 

-39 

-24 

-19 

0 

24 

12 

18 

~28 



Computations: 


whence 


whence 


=-28, 

5 = 82 + 5( — .28) = 80.6. 
5 = jj &/(!>) = - 54 , 

y = 77 + 6 (.54) = 79.7. 


In the table /(w) = f(y) and f(u) = f(x) because u and v are merely different ways 
of describing the cells but in no way change the frequencies in those cells. 


13. Computation of r. In the expressions of §10 and §11 the 
(u } v) coordinates could have been used instead of (x, y). The use of 
the former simplifies the computation of r. A preliminary discussion 
of certain expressions will help in understanding the formula for r 
to be used for a correlation table. Let us consider first the following 
expression: 

(a) v). 

U, V 

This means: multiply the / in each cell by the u and v coordinates of 
that cell and add the results, proceeding from cell to cell over the 
whole table in any order whatsoever. But it may be more con- 
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venient to proceed in a definite order, say down the columns. Then 

(a) becomes 

( b ) v) = £w]>>/(u, v). 

U V U V 

The expression ]T,vf(u, v ) the right member of (b) means: for 

V 

any u ( Le ., for any column), multiply each / by its own v and add 
the results. Let us denote this sum by V . Then the right member 
of ( b ) means: multiply the V for each column by the u of that 
column and add the results, proceeding from column to column 
(z.e., summing in the ^-direction). We may also obtain the same 
result as in (a) by proceeding along the rows. Thus (a) may be 
written 

(c) mi uv f( u > v ) = v )- 

V U V u 

The expression X \ u f ( u ? v ) means: for any v (i.e., for any row), 

u 

multiply each / in the row by its own u and add the results. If we 
call this sum U, then the right member of (c) means: multiply 
the U for each row by the v for that row and add the results, pro- 
ceeding from row to row (i.e., summing in the ^-direction). 

We are now ready to derive the formula for r. 

Since we are now dealing with a frequency distribution, the funda- 
mental definition of r becomes 

- x)(y - y)f(x, y) 

(26) r = -S* 

CFzCTy 

From (22) and (23), we have 

(x — x) — h(u — u), 


and from (24) and (25), 

(y - y) = k(v - v). 

Since (x, y) and (u, v) are merely different notations for the same 
cell, we have 

/fe y) = f(u, *>)• 

For computing purposes, the standard deviations are defined as 
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follows: 



Therefore, (26) becomes 


r = 


t;£(u - 0)(» - v)f(u, v) 

■‘■V u, v 


(T v 


If now we let 


U = 2>/(u, v) and V = J^vf(u, v), 


then since 


2>/(«, v) = Yj v H u fi u > v ) = ZXDi/Xw, v )> 


the above expression for r may be written in either of the following 


ways: 

hz° v - «» 

(29) 

r 

(Tv&v 


The fact that 


O’wffo 

2>tf = 

y u 


serves as a check in the table. 

The above procedure is illustrated in Table 35. 

Explanation; The table is self-explanatory except possibly the U and V entries. 
Recalling that U = v ), the first entry in the U column is obtained from 

u 

the sum of the following products: 0*1 + 1*2 + 2*3 +3*1 = 11; the second 
entry from —l*! 4- 0*3 + 1*8 + 2*1 + 3*5 = 24. Since V - J2 v f( v ) the first 
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Table 35 — Computation of r for Data of Table 32 


77 82 


fW\vf(v)\ v *f( v )\ U I vJJ 


0 77 


72 


3 5 6 


-2 67 3 I 2 

-3 62 


7 21 63 11 33 

5 18 36 72 24 48 


28 28 28 


23 0 


18 -18 18 -14 


-3 9 -3 


fin) 13 12 19 20 2 


uf(u) 

-39 

-24 

-19 

0 

24 

12 

18 

-28 

u 2 f(u) 

117 

48 

19 

0 

24 

24 

54 

286 

V 

-7 

-3 

3 

7 

30 

11 

13 


■uV 

21 

6 

-3 

0 

30 

22 

39 

© 


entry in the V row is obtained from 1*4 -f 0*3 H 1-2 4 2*3 *4 3*1 

Similarly, for the other entries, 
v Computations: 

<r„ 2 = — £>•/(«) - v? = 2.86 - (-.28)* 

= 2.7816. 

<r* = V2.7816 = 1.67. 

oV = ^ !>*/(») - 5* = 2.10 - (.54)* 

= 1.8084. 

<r„ = V 1.8084 = 1.34. . 

Therefore from (29) we have 

r= ^. L-(-28)(-54) = . 

(1.67) (1.34) °' 58 ' 
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14. Remarks on Computation of r . (a) iSigw of r. It should be 

observed that the sign of r depends on the choice of the positive di- 
rection along each coordinate axis. In Table 35 the origin of refer- 
ence is chosen so that the data occur in the first quadrant and the 
directions on the (x, y)-axes are the conventional ones. These 
directions were preserved in changing to (u, v) coordinates. If we 
had reversed the direction of the y-axis by labeling the y values 
larger than y — 77 by v — —1, — 2, —3, and those less than y = 77 
by v = 1, 2 , 3, the sign of r would be changed. But if the directions 
of both u and v were reversed, the sign of r would be unchanged. 

(b) Grouping errors. When N is small, say less than 100, and 
the data are grouped into cells, grouping errors are introduced. In 
general, the fewer cells used, the greater the errors. These may be 
corrected, in part, by applying Sheppard's corrections to cr u and 
cr v . However, this will not be insisted upon in this course. 

(c) Commercial charts . Computations can be expedited by the 
use of commercially prepared correlation charts. Several types of 
chart are available on the market. In her book (reference 15), 
Professor Helen M. Walker explains the merits of two of these which 
are recommended. She also gives the following advice to beginners: 
“ A chart is not a crutch to help the novice. It is a means of speed- 
ing up operations after they are well understood.” 

Exercises 

1. By equation (29), show that r is independent of the choice of origin and of 

the units of measurement. 

2. In Table 35, evaluate the following sums: 

L/(«> 2)> E/( 2. «), !>/(«> 1), 2>/(- 2, v), El :»/(«■ »). ») 

u v u v u v u,v 

A E/(«. «0 if » = 0. 

3. Derive (29). 

4. For the table on page 200, find r and x, y y <r x , <r y . Note that x Q , y 0 , h 

and k , do not need to be determined to compute r, but are required 

for the means and standard deviation of x and y. 

15 . Regression Lines for a Correlation Table. The data of a 
correlation table may be thought of as dots lying many deep at the 
centers of the several cells. There are, of course, f(z, y) of these in 
any cell whose coordinates are ( x , y), and f(x) is the total number of 
dots in a vertical column whose coordinate is x . Suppose now we 
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Heights and Weights of 200 Freshmen 
(Heights to Nearest iV Inch; Weights to Nearest \ Pound) 


\ X 

y\ 

90- 

99.5 

100- 

110- 

120- 

130- 

140- 

150- 

160- 

170- 

180- 

190- 

200- 

209.5 

fiy) 

76- 

77.9 




1 





| 




1 

74- 







1 

1 

1 

1 



4 

72- 




1 

1 

1 

4 


1 




' 8 

70- 



1 

2 

6 

7 

6 

2 

1 

2 

1 

1 

29 

68- 



2 

8 

17 

8 

9 

2 

1 

1 

1 


49 

66- 



8 

16 

14 

13 

6 

. 

2 

1 



1 

61 

64- 


3 

8 

7 

7 

3 

3 

1 

1 




33 

62- 

1 

4 

1 

7 

1 








14 

60- 













0 

58- 

59.9 


1 




. 







1 

fix) 

1 

8 

20 

42 

46 

32 

29 

8 

6 

4 

2 

2 

200 


Ans. x » 138.45 lbs.; y ~ 67.82 in. 
<s x — 19.6 lbs.; a-y = 2.8 in. 
r = 0.48. 


replace all the data in each column by an equal number of data con- 
centrated at the mean of that column. If we denote the ordinate of 
this mean point by y x , we have 

( 3 °) ^ X=! j(x) ^ y ^ X ’ 

Hence, y x f(x) represents the totality of all the values in a column. 

For each of the columns there will be a value of (30). Taking the 
hypothesis that the mean points of the several columns lie approxi- 
mately on a straight line y x — mix -j- k, we may find m\ and k under a 
least-squares criterion of approximation. If, in applying the criterion, 
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the square of the difference between the observed mean, 0 y x , and the 
computed mean, c y £} for each array, viz., (y x - m x x - k) 2 , is weighted 
with the number fix) in the array, it turns out that we get the same 
values for m x and k which we obtained when we fitted the regression 
line of y on x to the scatter diagram. 

In proving this, the student of calculus 1 would have an easy task 
in obtaining the normal equations: 

J2(y x - mix - k)f(x) = 0 

X 

Z®. - - k)xf(x) - 0 

X 

whose simultaneous solution yields the desired values of m i and k. 
Expanding (31), we have 

Yji4(x) ~ mij^xf(x) - kj^f(x) = 0 

X XX 

'EyzXf(x) - mi52x 2 f(x) - k^xfix) = 0 . 

X XX 

Since _ 

y) = Ny, 

x X y 

and 

TiVzxfix) = y) = J2 x yf(x, y), 

x x y x,y 

equation (32) becomes 

Ny — rriiNx — Nk = 0 
T,xyf(x, y) - miY^x 2 f{x) - kNx = 0. 

x , y x 

Solving (33) for mi and k we find 

k = y — mix 

J^xyf(x, y) - Nxy 
_ x^y _ V Nji , 

mi Yl x2 f( x ) ~ Vx 

x 

1 Differentiating partially £/(>)(& - ™ x x - k) 2 with respect to m and k 
respectively, and setting the results equal to zero, yields equation (31). Instead 
of differentiating this expression one may expand it, regard the result as 
a quadratic in both m and k, and use the theorem of §3, Chapter VII, to 
obtain (31). 
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and the equation of our line becomes 

y — (mm 

y x - r—x + y rz, 

(X x 0" x 

that is 

(8a) y x - y = r — (x - x). 

<r x 

Therefore, the best-fitting line for the means of the columns prop- 
erly weighted, and the best-fitting line for all the dots are one and 
the same straight line. But from the point of view of a correlation 
table, a regression line is to be regarded as the equation from which 
may be estimated the average of all the y’s associated with a particular 
value of x . In other words, a prediction in the latter case professes 
to give only the mean result (Figure 38). 



Fig. 38 — The Line of Regression of y on x is t he best Fitting Line for 
the Means of the Columns 

16. Applications. The data of a correlation table are usually re- 
garded as a sample of the much larger class of similar data consti- 
tuting the universe. A regression equation calculated from a limited 
but representative sample may give valuable estimates of the average 
values of y in the universe associated with designated values of x. 

Let us consider the data of Table 36 on page 203. Suppose a 
personnel manager in charge of hiring employees of a manufacturing 
plant has instituted a system of mental tests for applicants, and has 
gathered these data showing the relationship between the standing 
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made by applicants on their mental tests and their productive ability 
when measured according to a certain standard of production after 
they are hired. 


Table 36 
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54 
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14 
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El 








Here x represents the grade made on mental test, and y the per cent of standard 
in production. (See also Table 27.) The means of columns are denoted by y Xf 
and the means of rows x y . 

In order to demonstrate to the company’s management the con- 
nection between his mental tests and the productivity of the em- 
ployees he has hired, the personnel manager does the following: 

(1) Computes the coefficient of correlation between the two series; 

(2) Shows what the estimated productivity of employees would be 
whose grades in the mental test fell on the mid-points of the class 
intervals of the mental test data. 

The means of the columns and of the rows are given in the table. 
In addition, he obtains the following results: 

x = 42.17, cr y = 17.41, r = .417, 

y — 87.31, cf x = 8.40, mi = r— = .864. 

cr x 

Therefore, the line of regression of y on x is 

y x - 87.31 - .864 (a: - 42.17) 
or 


(34) 


y x - .864z + 50.88. 
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This is the equation of the line that best fits the points which desig- 
nate the means of the columns (Figure 39). Hence, for an assigned 
value of x, equation (34) gives the value of y which is the expected 
mean of the column defined by the assigned value of x. The personnel 
manager is thus prepared to predict the productivity of applicants 
on the basis of their mental test grades. In other words, the regres- 
sion equation calculated from the records of those already hired may 
be used in selecting from future applicants those most likely to 



5 lO IS 20 25 30 35 40 45 50 55 60 X 


Fig. 39 — * Means of Columns and Line of Regression 
of y on x for Table 36 

Exercises 

1. Verify the value of r given for Table 36. 

2 . Verify the means of the columns given in Table 36. 

3 . Using equation (34) show what the estimated productivity of employees 

in the factory referred to above would be whose mental test grades were 
22.5, 27.5, etc. 

4 . For Table 35, 

(a) Find the equations of the regression lines. 

1 The critical reader may doubt if the value r = .417 is sufficiently large to 
warrant much confidence in (34) as a predicting equation. The question of 
reliability of predictions is discussed later. 
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(b) Locate the axes through the mean of the table and graph the regression 
lines. 

(c) Compute S y . 

6. As in Exercise 4 for the table on page 200. 

Ans. to (a), 

y x « .0692 + 58.3 
x y = 3.362/ — 89.4. 

17. S y for a Correlation Table. For ungrouped data we have 
defined S y as a measure of the clustering of the data around the 
regression line, and have observed that it is called the standard error 
of estimate. In order to understand what S y has to do with “ esti- 
mates ” it is necessary first to consider its meaning in a correlation 
table. Let us denote by s y . x the standard error about the regression 
line in the array of y’s at x. Thus we have 

(35) S y . x 2 = — — ]T( 0 y - cVxYfix, y ) 

where 0 y denotes an observed y value and c y x denotes the value 
obtained from the regression line for that column. Thus, for the 
column headed 32.5 in Table 36 we obtain the computed value 
y x by substituting x = 32.5 in (34) whence we find y x = 78.96. 
To evaluate s y . x 2 for this column we find the square of the deviation 
of each of the 32 values of c y from 78.96, add the results and divide 
by 32. Extracting the square root of the result we find s y . x = 15.96. 
Moving along the regression line suppose we have computed an s y .J 
for each array of y’ s and averaged the results. It is interesting to 
learn that this average is S y 2 . This is stated more precisely in the 
following theorem. 

Theorem V. The arithmetic mean of the values of s y . x 2 for the several 
columns when each s y . x 2 is weighted with the frequency in that column is 
S y 2 - cr y 2 (l - r 2 ). 

Proof: Using (35) we have 

VrS/W-W = - cVxYKx, y). 

£y x ™ x y 

Substituting the value given by (8a), §15, in the right member of the 
above identity we have 

^ ?? [ar - - s/). 
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that is 

iz{(y - y) - *•— (* - X)\f{x,y) 

l <Tx J 

which reduces to <r y 2 (l — r 2 ). It is left as an exercise for the student 
to show' this. 

For Table 36 we find S y = 15.83. In Figure 40 the parallel lines 

on either side of the regression 
line RR' are drawn at a vertical 
distance of z hS y from it. They 
describe the average limits of 
scatter above and below the re- 
gression line. 

To connect S y with the reli- 
ability of predictions it is neces- 
x sary to introduce the concept of 
a correlation surface. Indeed, 
a knowledge of the fundamental 
properties of a correlation sur- 
face is desirable for a wider outlook on correlation theory in general. 

18. Normal Correlation Surface. A correlation table may be 
idealized into a surface in somewhat the same way that a histogram 
is idealized into a frequency curve. The concept of a surface relates 
to the universe from which the observed data of the table may be 
regarded as a sample. Let the dimensions of the cells of a table be 
Ax and Ay, and suppose columns are erected upon these cells with 
altitudes proportional to the frequencies in the cells. The result is 
a sort of solid histogram. Then as Ax — » 0, Ay — » 0, N <*>, the 
tops of the columns approach as a limit a smooth surface which is 
called a correlation surface. Our discussion will be confined to the 
case where we may assume that this limit is a normal correlation 
surface. In discussing this surface it is convenient to let x and y 
represept deviations from the respective means and to let z = f(x, y) 
denote the frequency function representing the surface. Such a 
surface is shown in Figure 41. 

Any section of this surface parallel to the yz - plane is a normal 
curve and represents the distribution in a column at x. Similarly 
any section parallel to the xz - plane representing a row is a normal 
curve. The frequency in a cell is measured by that portion of the 
volume under the surface which lies over that cell. All those cells 
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in which the frequency is a fixed value lie on an ellipse. That is, if 
contour lines are drawn on the surface joining the points of equal 
height above the base they will be ellipses. In other words, sections 
of the surface parallel to the £2/-plane are ellipses. 

IV 


| <C3 ■ 


Fig. 41 — Frequency Surface for Correlated Variables 

We will digress here for a brief discussion of an ellipse. We may 
think of an ellipse as a transitional figure between a circle and a 
straight line, as the circle flattens out. That is to say, the limiting 
form of an ellipse is a circle at 
one extreme of the flattening 
process and a straight line seg- 
ment at the other extreme. 

The degree of flatness is called 
the eccentricity of the ellipse, 
and it is proved in analytic 
geometry that the eccentricity 
varies from zero in the case of 
a circle to unity when the ellipse p IG< 42 

degenerates into a line. All 

ellipses having the same eccentricity whatever their size have the 
same relative proportions and are therefore similar in form. 

The eccentricity of the elliptical contours of different normal cor- 
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relation surfaces varies with the amount of correlation existing in 
the corresponding universe. A surface with narrow elliptical con- 
tours represents a universe in which there is high correlation, whereas 
if the variables are completely independent in the probability sense 
the contour lines are circles when the variables are expressed in 
standard units. If the variables are not expressed in standard units 
(and r = 0) then the contour lines may be ellipses but their major 
and minor axes will coincide with the x- and y-axes as in Figure 42. 
When r^O the axes of the ellipses make an angle with the xy- axes, 
their major axis cuts quadrants I and III in the £2/-plane if r > 0 (as 
in Figure 41) and quadrants II and IV if r < 0. 

19. Properties of Normal Bivariate Surface. The equation of a 
normal correlation surface is given by 

(36) f(x, y) = Ke~ p 

where 

1 l x 2 2 rxy y 2 } 

2(1 — r 2 ) \cr x 2 ar x cr y a y 2 j 

K = N 4- (27 tct x <t 2/ V / 1 — r 2 ), and x and y represent the correlated 
variables referred to their respective means as origin. 

By means of (36) an observed distribution may be fitted with the 
appropriate normal surface assuming that the sample might reason- 
ably have come from such a universe. This is accomplished by 
replacing <r x , cr y , r, and N in (36) by the corresponding statistics 
calculated from the sample and taking the origin at the mean of the 
table. Let us assume that an observed distribution has been gradu- 
ated by such a surface and the theoretical cell frequencies obtained. 
The surface extends to infinity in the xy - plane but contour ellipses 
can be obtained which will enclose any desired percentage of the 
given frequency when these ellipses are projected orthogonally onto 
the #y-plane. They are all concentric, similar, and similarly placed. 
Figure 43 represents such an ellipse, say the smallest one necessary 
to enclose all the given cells. The systems of perpendicular chords 
represent the columns and rows of the table. 

The graduated frequencies for each column are normal distri- 
butions whose means lie on the regression line of y on x and whose 
standard deviations are in each case given by S y = <r y (l — r 2 ) 1/2 . 
To state the same thing in a slightly different way, an array of y’s 
corresponding to a fixed value Xi of x is a normal distribution whose 
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mean deviates from y by r(a y /a z )xi and whose standard deviation is 
S y = (T y (l — r 2 ) 1/2 which is independent of x x and therefore is the 
same for all such arrays. Similarly an array of # 5 s corresponding to 
j a particular value y\ of y is a normal distribution with a mean which 

| deviates from x by r(<r x /cry)y 1 , and a standard deviation of S z = 

| cr x (l — r 2 ) 1/2 which is independent of 2/1 and therefore is the same 

| for all such arrays. A careful study of Figure 41 will help in under- 

! standing what is meant by these statements. 

When the means y x of the columns fall exactly on the regression 
| line, s y . x becomes the standard deviation of a column and is therefore 

[ the same as S v . Theorem V states 

that S y 2 is an average of the values 
of Sy. x 2 but when all the quantities 
being averaged have the same 
value, as they do in the ideal case 
of the normal surface, their (mean) 
average is that value. When the 
| standard deviations of the columns 

are equal, the regression system 
of y on x is called a homoscedastic 
system. In a universe where they 
are not equal the system is said to 
be heteroscedastic. For a homo- 
scedastic system with linear regression, S y = <r„(l — r 2 ) 1/2 is the 
standard deviation of each array of y’s. 

20. Reliability of Predictions. In using a regression equation to 
make predictions we are naturally interested in the degree of con- 
fidence to be expected in the predictions thus made. The use of S v 
in this connection is based upon the properties of the normal cor- 
relation surface. 

1 Let us imagine the universe of which Table 36 is a sample and 

assume that it may be described by a normal surface. Confining 
r our attention to a section parallel to the 2 / 2 -plane in Figure 41 we 

| know that an x array of y ’ s is distributed normally about a value of 

y deter min ed by a designated value of x in the regression equation 
of y on x . That is, the mean of this normal distribution is the 
| predicted value of y and its standard deviation is S y . The per- 

centage distribution of such an array is the same as that given in 
Figure 23 of Chapter VI, if S y is taken as the unit of measurement 
along the horizontal axis. But an estimate of S y is its value cal- 
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culated from the sample. Moreover, for an observed distribution, 
we have seen that S v is the average standard deviation of the several 
columns and therefore it may reasonably be taken as an approxi- 
mation to the theoretical S w which in the universe is the same for 
all the columns. We also take the calculated regression equation 
as an approximation to the theoretical. 

By measuring deviations from the predicted value in terms of 
S in the same way that a is used as a unit in measuring deviations 
from the mean, we may then enter a normal probability scale for 

the probability of a deviation 



involving multiples of S y A Ac- 
cording to this scale the prob- 


ability P y is about .68 for a 


deviation of ±S y from the pre- 
dicted value, and the chances 
are even for a deviation of 
.6745£ y on either side of the 
predicted value. 

For Table 36 we have found 
S y = 15.83 and for an applicant 
making x = 32.5 on the mental 
test we have predicted y = 
78.96. Therefore the chances 


Fig. 44 - Representing an x Array of are about 6g in 100 that his 
y’s and Deviations of ±S y from a 
Predicted Value of y 


percentage of productivity will 
be between 78.96 — 15.83 and 
78.96 + 15.83, that is, between 63.13 and 94.79- In other words, 
the probability is about .68 that the predicted value will not be in 
error by more than 15.83. 

To summarize, in a normal bivariate universe each array is a 
normal distribution and therefore its mean coincides with its mode. 
Since regression is linear, a value predicted from the regression equa- 
tion of y on x is the mean value of y for a designated value of x. 


Then, P„ = £ $(f) dt is the probability for a deviation from the 


predicted value of y x as small as \t\ where t is expressed in units of the 
standard error S„ of a column. Thus, 
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Then 1 — P y is the probability for a deviation as large as |<|. Simi- 
larly, when dealing with the regression line of x on y, P x = J' is 

the probability for a deviation from the predicted value x v as small 
as |i|, where now t = (x — x v )/S x . 

Exercises 

1. Refer to problem 4, §4. Assume that the data given there are obtained 

from a correlation table which is a representative sample from a normal 
bivariate universe describing the heights and weights of senior men stu- 
dents in colleges and universities of the United States. Then a value 
predicted from the regression equation of y on x will give the mean of the 
“ column ” at z. Similarly, for an assigned y , the corresponding x in the 
regression equation of x on y will be the mean of the “ row ” at y. Under 
this assumption, determine the probability that Doe’s height is outside 
the interval 65.75 — 77.75 inches. What are the chances that Roe will 
be between 100.8 and 122.4 pounds in weight? 

Arts. 1 — P y — .0027, P x = .5 (approximately). 

2. Discuss the reliability of the predictions which you made in Exercise 3, §16. 

Outline of Solution. Suppose a reliability level of P y — .5 is desired. Mak- 
ing the necessary assumptions, this allows a deviation of t — ±.6745. 
Since S y = 15.83 we have 

d_ 

~ 15.83 

where d — y — y z . That is, y — y x db ? For x = 37.5, y = ? ± ? 
So the probability is .5 that the standard of production will be between 
| what limits for a person making x = 37.5 on the mental test? The 

problem is analogous for any other designated value of P y and for other 
assigned values of x. 


f 

I 3. Consider the surface represented by (36). Prove that a section of the sur- 

( face parallel to the yz coordinate plane is a normal curve with its mean 

f on the regression line of y on x and with variance S y 2 — <r y 2 (l — r 2 ). 

Outline of Solution. Write (36) in the form 

(a) / = Ke~ p , 

where P ~ (u 2 — 2 ruv + z> 2 )/2(l — r 2 ), u — x/<r Zj v — y/<r yt x ~ 0 = y. 
i The trace of the surface in the plane u — u\ is determined by substituting 

u x for u in (a). This substitution yields the result 

(&) / = Ce~ T 
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where T — {v — ru,iY/2{l — r 2 ), C = Ke^' 2 . Upon returning to ( x , y ) 

coordinates, (6) becomes 

(c) / = c e -h 2 (v-mf f 

where m = rxicr y /<r x , h 2 = 1/(2aSj / 2 ), S y 2 = <r y 2 (l — r 2 ). 

21. Non-Linear Regression. Correlation Ratio. We have seen 
that the regression systems of a normal correlation surface are linear. 
In a correlation table which is a representative sample from a normal 
bivariate universe the means of the arrays would lie approximately 
on straight lines. But in correlation tables which are samples of 
other types of universes, regression might not be linear- Moreover, 
one of the regression curves might be strictly linear and the other 
non-linear. The following numerical example illustrates the latter 
possibility. 


X 

0 

1 

2 

3 

f(y) 

2 

0 

0 

0 

2 

2 

1 

0 

1 

1 

2 

4 

0 

1 

1 

0 

0 

2 

fix) 

1 

2 

1 

1 4 

i 
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In this example, the regression of y on x is linear whereas that of x on 
y is non-linear. 

When the means of the columns (or of the rows) do not lie approx- 
imately on a straight line, the use of r may be misleading because 
r — 0 indicates absence of linear correlation only and not necessarily 
absence of correlation in general. 

One of the best treatments of this situation is that given in the 
Cams Monograph on Mathematical Statistics , which will be repro- 
duced substantially here. 

In introducing a correlation ratio , tj yx , (eta) of y on x, as an appropriate measure 
of correlation to take the place of the correlation coefficient in such a situation, 
we may get suggestions as to what is appropriate by solving for r in (10). This 
gives 

at " 

(37) r 2 = 1 — — • 

<Ty 2 

where we may recall that S y z is the mean square of deviations from the line of 



Sec. 21 


Non-Linear Regression 


213 


regression. Then 


r = dt 


S /] 112 m 

(Ty 2 ^ 


This formula could be used appropriately as a definition of r in place of our 
definition in (1), and its examination may throw further light on the significance 
of r. When S y — 0, the formula gives r — ±1 and, as we have seen earlier, 
all the dots of the scatter diagram must then fall exactly on the line of regression. 
When S y — <r y , the formula gives r = 0, and the regression line is in this case of 
no aid in predicting the value of y from assigned values of x. In the formula 
r 2 — 1 — S y 2 /a y 2 it is important to keep in mind that the mean square deviation 
S y 2 is from the line of regression. Next, let S y 2 be the corresponding mean 
square of deviations from the means of columns. Then Sy 2 = S y 2 when the 
regression is strictly linear, but S y ' 2 S y 2 when the regression is non-linear. 
This fact suggests the use of a formula closely related to [1 — S y 2 /<r y 2 ] 112 for a 
measure of non-linear regression by replacing S y by S y f . We than write 

SJ 2 

(38) Vy* 2 = 1 — 

<Ty 


where y yx is the correlation ratio of y on x, and S y ' 2 is the mean square of devia- 
tions from the means of the columns whether these means are near to or far 
from the line of regression. 

In general, we may say that the correlation ratio of y on x is a measure of the 
clustering of dots about the means of columns. 

An analogous discussion for the rows obviously leads to 


giving rjxy 2 , the square of the correlation ratio of x on y. 

That r] yx 2 S 1 and that the equality holds only when all the dots in each 
column are at the mean of the column follows at once from (38). 

That rjyx 2 ~ r 2 may be shown by recalling the meanings of S y 2 in (37) and 
of S y ' 2 in (38). A mean square of deviations in each column is a minimum when 
the deviations are taken from the mean of the array. Hence, the S y ' 2 in (38) 
must be equal to or less than S y 2 in (37) for the same data, since the deviations 
in (37) are measured from the line of regression. Hence, we have shown that 

1 ^ Vyx 2 ^ r 2 . 

Moreover, when the regression of y on x is linear, rj yz 2 — r 2 found from the sample 
differs from zero by an amount not greater than the fluctuations due to random 
sampling. Hence, rj yx 2 — r 2 becomes a criterion for testing the linearity of 
the regression of y on x. 

For computational purposes, it is desirable to express the correlation ratios 
in a form involving the standard deviations of the means of arrays. For this 
purpose, let y x be the mean of any column of y’s and a yx the standard deviation 
of the means of columns when the square ( y z — y) 2 of each deviation is weighted 
with the number fix) in the column. Then it follows that 

<r „ 2 ~ S ' 2 <rr 2 
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That is, the correlation ratio of y on x is the ratio of the standard deviation of 
the means of columns to the standard deviation of all y’s. 1 

To prove (39) we must show that <j y 2 — S y 2 = cr ?7x 2 . We begin 
by observing that the concentration of the dots in a column about 
their mean may be measured in terms of their standard deviation. 
Let <?y. z denote the standard deviation of the y’s in the column at x . 
That is, 

(40) <T y . x 2 = -t- - VxYf{x, y). 

J\%) y 

Now, the concentration of the dots in the entire table about the 
means of the columns may be measured by finding the mean value 
of all such expressions a y , x 2 for all the columns of the table. But 
since there are more points in some columns than in others, it will be 
desirable to weight the <r y . x 2 for each column by multiplying it by 
the number of points or dots in the column. It is this weighted 
mean value of the cr y . x 2 ’s which we have denoted by S y ' 2 . That is, 

(41) 

In order to verify (39) we must now show that 

°y‘ ~ By' 2 + 

Adapting (14) of §9, Chapter Y, to the notation of this chapter, 
we have 

(42) + 

X X 

This follows from the fact that N is composed of the several sub- 
distributions f(x) in the columns, and <r y . x is the standard deviation 
of a column about its mean y x . It is obvious that 

^2/(z)(l/* - yY 

gives the variance <r% 2 of the means of the columns. The above 
expression (42) then becomes 

Act, 2 = NSy 2 + N 

which reduces to < r, 2 — SJ 2 = cr 5x 2 , and hence we obtain (39). 

22. Computation of t ] 2 . It should be instructive to compute 
7) yx 2 for Table 36, by both relations (38) and (39). 

1 Eietz, Carus Monograph on Mathematical Statistics, p. 89 et seq. 
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For (38) we have the following: 


■n V x 2 = 1 - ^ 7 » S y ' 2 = y ^f(x)a y . x 2 , 

a y z 

*V* = E(y - &) 2 /fo y). 


2 

(Ty-x 

m 

106.12 1 

. 7 

191.83 

14 

246.48 

32 

283.63 

49 

257.65 

55 

294.51 ' 

54 

222.53 

35 

71.43 

14 

S v ' 2 = 246.45 



c 2 = (17.41) 2 = 303.11 
, 2-1 

Vvx 303.11 


= .1869. 


For (39) we have the following: 


Vyx 2 



y) 2 ffr)> 

y = 87.31. 



Vx 

m 

67.86 

7 

72.14 

14 

81.87 

32 

84.80 

49 

85.73 

55 

90.92 

54 

95.57 

35 

105.00 

14 


56.66 (see Exercise 3, p. 97). 
56.66 
303.11 
.1869. 


1 See Table 27 and Table 16. 
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In verifying (39) for this example we have cr y 2 ~ S y n = 303.11 — 
246.45 = 56.66 and cr s * = 56.66. 

The above illustrations are useful in giving an understanding of 
the meaning of rj yx 2 . However, for computational purposes, another 
formula may be derived which involves less labor than either (38) 
or (39). In fact, the computation of a correlation ratio may be very 
conveniently performed by an easy extension of a correlation table. 
The derivation of the appropriate formula will now be given. 

The standard deviation (<r 5 J of the means of the columns may be 
expressed in the (u, v) units by the relation c Sx 2 = AVs u 2 


where 


-■'£ l f(u)Vu 2 


which is the definition of the standard deviation of the variable v u . 
This is apparent if we observe that the mean for the whole table in 
the ^-direction (v) is the mean of the quantities v u for the several 
columns. 1 
Since 

=7 ZT ») = 77-T’ 


mr 


we have 


It— 

N u f(u) 


Recalling that c r„ 2 = fc*<r» 2 , we have 




that is, 


L fl ^ V 2 

-1— V V 2 

'v IN u f(u) 


An analogous discussion for the rows of x’s leads to 

giving the square of the correlation ratio of x on y. 

T? t L/(“) 7TT Y.vf {u, v) = X !>/(«, v) = i >f(u, V ) = V. 

N u -N u f(u) v u v ’ N u,v r -< 
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Example. Find rj yx 2 for Table 35. Solution: Referring to this table and 
using (43) we obtain the following results: 


F 2 

49 

9 

9 


900 

121 

169 

Sum 

V*/f(u) 

3.78 

.75 

.47 

2.45 

37.50 

20.17 

28.17 

93.29 


v 2 = .2916, <r, 2 = 1.8084, N = 100. 

'■•'-LSitso' 93 - 29 ’-' 2916 ] 

rj vz 2 ~ *3546. 

It may be well to mention that the value of p is not independent 
of the classification of the data. As the class intervals become 
narrower, rj approaches unity. This may be understood from (38). 
If the grouping were so fine that only one item appeared in each 
column, then it would constitute the mean of that column. In this 
case S y ' would be zero and 77 would therefore be unity. On the other 
hand, a very coarse grouping tends to make the value of rj approach r. 
u Student ” has given a formula for The Correction to be Made in the 
Correlation Ratio for Grouping in Biometrika, vol. IX, pp. 316-320. 

23. Further Discussion. Test for Linearity of Regression. Let 
us consider the totality of mean points (x, y x ) of the col umns and 
think of a curve connecting them. Of course, for a table of observed 
data, it is possible to draw many such curves. In order to show 
clearly why a comparison of t ? 2 and r 2 is the basis of a test for linearity 
of regression, it will be necessary to consider a theoretical table in 
which there is only one such curve. When we speak of the regression 
curve we are thinking, not of the given table in which the dimensions 
of the cells are h and k , but of an ideal table in which there is an 
infinity of cells of zero dimensions. To put it another way, consider 
a sample of N pairs of values (x iy pi) from which a correlation table 
is made with cells whose dimensions are h and L If parallelepipeds 
are erected on the cells with heights proportional to the frequencies, 
the result is a solid histogram bounded by a broken surface. As 
h — > 0, k — » 0 , and N — > 00 , this histogram will approach some solid, 
bounded by a smooth surface. An example of such a surface is the 
normal correlation surface. In such an ideal table, it is possible to 
have but one curve connecting the means of the columns. This 
curve is sometimes styled the true regression curve of y on x. In an 
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analogous way for the means of the rows there would be a true 
regression curve of x on y. It is one of these curves that we have in 
mind when we speak of “ the regression curve ” or a the regression.” 
For a normal bivariate universe (represented by a normal correlation 
surface), regression is linear. But for other types of bivariate 
universes (which might be represented by skew surfaces), it is 
conceivable that regression might be parabolic or exponential or 
some other type of curve. In such types, regression is said to be 

non-linear. The curve which is chosen to 
approximate the true regression curve must 
not be confused with the true regression 
curve. The latter notion relates to the 
ideal universe from which the data at hand 
are a sample. It is defined as the locus 
of the mean points of the columns of the 
theoretical table. When we fit a curve to 
the means of the columns of an observed 
table, this regression curve is merely an 
approximation to the ideal set up in the definition. Similar state- 
ments may be made about the regression of x on y. 

We will now recapitulate the expressions used in the comparative 
analysis of r 2 and v vx 2 for an observed table. 



(45) 


(46) 


oy* 2 = — t- 2 Z(y ~ oy*) 2 f(x, y ) 

J \ x ) y 

• £/ 2 =|X>v-* 2 /(*) 

2 . s/ 2 <% 2 

= 1 r = — o 

(Ty 2 

P 1 

Sy. x 2 “ SxYfix, y) 

J\ x ) y 

- s* 2 =^Ev* 2 /(x) 


Recall that a y . x 2 is defined as the variance in a column and therefore 
as the square of the standard error about the regression curve, what- 
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ever it may be, which goes through the means of the columns. S y 2 
is an average of the <r y . x 2 values, and rj yx 2 is defined in terms of $/. 
Correspondingly, s y . x 2 is the square of the standard error in a column 
about the line which best fits the means of the columns. S y 2 is an 
average of the s y . x 2 values, and r 2 is defined in terms of S y . If 
regression is linear, the means of the columns will fall on the u best- 
fitting line ” and cr y . x 2 becomes the same as s y . x 2 . Then S y 2 = S y 2 , 
and hence r\ yx — r 2 . 

It is interesting to observe that a y . x 2 is the second moment about 
the mean, for an array of y’ s, i.e., for a column. In the notation 
of moments it could be denoted by ix 2:y . x . In this notation, s y . x 2 
could be denoted by v^ :y , X) being the second moment in an array 
of y’s about a point other than its mean. Since < v 2 , it follows 
that cr y . x 2 < s y . x 2 . Therefore S y 2 < S y 2 and y] yx 2 > r 2 . If each y 
value of a column is at the mean of that column then it is 
obvious that cr y . x 2 will be zero. In this case, S y = 0, and rj yx 2 — 1. 
On the other hand, for any column, the contribution of <r y . x 2 f(x ) to 
S y 2 cannot exceed its contribution to a y 2 . Taking the weighted 
mean of the respective contributions over all the columns, we have 
SJ 2 < <T y 2 and hence 

Vyx 2 < 1 . 

Writing (38) in the form 

S y ' = <r y ( 1 - 

we see that S y is a measure of dispersion about the regression curve 
(which is the locus of the means) corresponding to S y — cr y ( 1 — r 2 ) 1/2 
which is the standard error about the “ best ” line. If r 2 = 1, then 
y is related to x by a linear function. If 7j yx 2 = 1, it follows that y 
is a single-valued function of x. On the other hand, if r 2 — 0, it does 
not necessarily follow that there is no relation 1 between y and x . If 
r} yx 2 = 0 then r 2 = 0, but if r 2 = 0 it does not necessarily follow that 

Vvx 2 - 0 . 

In the ideal table, regression of y on x is linear if and only if 
Vvx 2 — r 2 = 0. But in the case of an observed table, allowance must 
be made for sampling fluctuations. A corresponding analysis could 
be made for r 2 and t ] xy 2 , and r\ xy 2 — r 2 computed from the sample should 

1 See H. L. Rietz, “ On Functional Relations for which the Coefficient of Corre- 
lation is Zero.” Journal American Statistical Association , vol. 16, 1919, pp. 472- 
76. 
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differ from zero by an amount not greater than the fluctuations due 
to chance, if regression of x on y is linear. The question naturally 
arises, what discrepancy between the computed values of yf and r 2 
may be tolerated before we conclude that regression is non-linear? 
This problem has been investigated, and Blakeman 1 has proposed a 
testing formula. If certain assumptions are made, a simple though 
approximate test may be deduced from Blakeman’s formula. Ac- 
cording to this approximate test if 

(47) N(v 2 “ r 2 ) < 11.4 

then linear regression may be assumed. Since there are two rj *’ s there 
are two tests. It is possible for one of the regression curves to be 
linear and the other not. 

Evaluating (47) for Table 35 we obtain 100 [.3546 — (.58) 2 ] - 1.82, 
so the regression of y on x may be assumed to be linear. 

R. A. Fisher has shown that the Blakeman test is not very reliable. 
One can easily construct an example for which regression is obviously 
non-linear yet which satisfies the criterion (47). Consider the fol- 
lowing table: 



Here, N = 5, X V = 27, x = 3, y = 9/5. From (3), therefore, 
r = 0. From (40) and (41), S y ' — 0 and y yz — 1. Applying (47), 
Blakeman's test yields a verdict of linear regression of y on x . It 
appears that Blakeman’s criterion is of doubtful utility. A more 
efficient method of testing linearity of regression is given in Part II. 

Exercises 

1. Using (43) and (44) find 7] yx 2 and 7j xy 2 for the table referred to in Exercise 4, 
page 221. Apply the test (47) and state your opinion about the linearity 
of regressions. 

1 See Handbook of Mathematical Statistics , Rietz and others, p. 131. 




Sec. 23 


221 


Test for Linearity of Regression 

2. In the following table, x - Interest Rates, 4-6 months Commercial Paper; 
y = Total Bills Discounted by Federal Reserve Banks (1923-1932). Find 
r and r\ y x 2 * Form an opinion about linearity of regression of y on %, (Data 
from Elements of Statistics, Davis and Nelson, page 288. ) 


Class 

Marks 

V 











7 







1 

6 

6 

6 

6 





1 

2 

3 

4 



5 





1 

3 

1 

2 



4 




2 


9 

4 

1 



3 


1 

2 

D 

4 

9 

4 




2 


1 


S 

11 

5 

1 




1 

4 


2 

3 

3 

1 





0 

2 

3 

3 

5 

3 






Class 

Marks 

X 

0 

1 

2 


4 

5 

6 

7 

8 

9 


3. In §44, Statistical Methods for Research Workers, R. A. Fisher writes: “ The 

sum of the squares of the deviations of all the values of y from their gen- 
eral mean may be broken up into two parts, one representing the sum of 
the squares of the deviations of the means of the arrays from the general 
mean, each multiplied by the number in the array, while the second is the 
sum of the squares of the deviations of each observation from the mean of 
the array in which it occurs .” [Compare with our (14a) of Chapter V.] 
Prove Fisher’s statement. Hint. In symbols, you are to prove that 

V = Vi + v 2 

where 

v = Y,(y - yYffay) 

x, y 

ui = YL ($* - y)V( x ) 

x 

= Z (y - VzYSix, y )• 

x, y 

4. Prove that y vx 2 is the ratio between v x and V as defined in Exercise 3. 
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6. The mortality experience during the early years of an insurance company 
presents an interesting study in correlation. The following table shows 
for male lives the correlation between the ages (rr) of the insured at issue 
of policy and his age (y) at death. Data of the Midland Life Insurance 
Company, 1 1906-1924. 




24. Correlation from Ranks. Before defining rank we will find the 
variance of the difference, z, between corresponding values of two 
variables. Let x and y denote corresponding values of two series each 
consisting of N variates. Form a third series z where = Xi — y im 
Then the mean of z is given by z = x — y and the standard devia- 
tion of z is, by definition, 



1 From a paper On Certain Applications of Mathematical Statistics to Actuarial 
Data in The Record, American Institute of Actuaries, vol. XIII, Part II, 
No. 28, November, 1924. 






Sec. 24 


Correlation from Ranks 


223 


Replacing z by its equal x — y, we have 

a* 2 = Z^ 2 ~ Zxy + y 2 ) - X 2 — y 2 + 2 xy 

= Z- 2 - * 2 } - 2{| 2> - *?} + {| Z 2/ 2 - v 2 \ • 

Whence 

(48) <r s 2 = cr x 2 — 2rcr s (r y + <r„ 2 . 


If the variables a; and y are uncorrelated, we have as a special case 

<J 2 ~ &x 2 + <Ty 2 . 

Solving (48) for r , we obtain 


(49) 


r = 


* 2 + CTy 2 — <T* 2 


2 CT 2,0" ^ 


This is another expression for the correlation coefficient and involves 
standard deviations only. In particular, it may be used to advantage 
when x and y denote ranks, where by rank we mean order of magni- 
tude or importance. That is, rank refers to the position of a variate 
in an arrangement. 

If x and y denote the ranks of the same item with respect to two 
characteristics, and no ranks are omitted, and there are no duplica- 
tions of ranks, then both x and y refer to the integers from 1 to N* 

Therefore, x = y } and <r x 2 = — (N 2 — 1) = c y 2 . See Theorem VI, 

Chapter V. Moreover, 




= — TV - 


“^ZC 5 - v ) 2 - (x- yY 


^Z(* _ yY, since x 



Let R denote the correlation coefficient when x and y refer to ranks 
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rather than variates. 

R = 

which simplifies into 
(50) 


Then (49) becomes 
fN 2 - 


*(^) 


R = 1 


- y y 


N(N 2 - 1) 

This is known as Spearman’ s formula for rank correlation. 

If two or more variates are tied it is customary to divide the 
corresponding rank numbers among the variates concerned, using 
fractions if necessary. 


' Example . Suppose we have the following scores made in two tests, arranged 
in the order of their rank. Find the correlation between ranks. 


Indi- 

1st Subject 

2nd Subject 

x — y 

(x - y)* 

vidual 

Score 

Rank — x 

Score 

Rank — y 

A 

92 

1 

85 

2 

-1 

1 

B 

86 

2 

76 

4 

-2 

4 

C 

84 

3 

i 

i 

93 

1 

2 

4 

D 

78 

4 

68 

6 

-2 

4 

E 

71 

5 

67 

7 

—2 

4 

F 

69 

6 

83 

3 

3 

9 

G 

66 

7 

54 

9 

-2 

4 

H 

58 

8 

70 

5 

3 

9 

I 

53 

j 9 



-1 

1 

J 

45 

! io 

59 

8 

2 

4 

N ~ 10 





Total 

44 


We find R 


6(44) 

10(99) 


= .733. 
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Exercises 

1. Suppose z = x + y- How would this change formulas (48) and (49)? 

2. Twelve salesmen are ranked in order of merit for efficiency by their manager. 

They are also ranked in accordance with their length of service. What 
indication is there of a relation between length of service and efficiency? 
(Garrett.) 



Years of 

Order of Merit 

Order of 
Merit 

Salesmen 

Service 

(Service) 

t Effic .) 

A 

5 

7.5 

6 

B 

2 

11.5 

12 

C 

10 

2 

1 

D 

8 

4 

9 

E 

6 

6 

8 

F 

4 

9 

5 

G 

12 

1 

2 

H 

2 

11.5 

10 

I 

7 

5 

3 

J 

5 

7.5 

7 

K 

9 

3 

4 

L 

3 

10 

11 


The fractions in the third column denote ties in rank. Thus, A and J each 
served 5 years and each is ranked 7.5. The next individual is ranked 9. 
Arts. R = .80. 

8. Find R for the following data: 



Ranh 

Score 

Ranh 

Score 

A 

1 

92 

2 

88 

B 

2 

89 

4 

85 

C 

3 

87 

1 

93 

D 

4 

86 

6 

79 

E 

5 

83 

7 

70 

F 

6 

77 

3 

87 

G 

7 

71 

9 

52 

H 

8 

62 

5 

84 

I 

9 

53 

10 

41 

J 

10 

40 

8 

64 


Am. R = .733. 

26. Interpretation. Common Elements. Although statistical 
theory gives a description of the indicated relationship between two 
related variables, the interpretation of the results “ abound in pitfalls 
easily overlooked by the unwary, while they are cantering gaily 
along upon their arithmetic.” 
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The methodological side has been developed until we can find correlation coeffi- 
cients by simply turning a crank, but the explanation of the meaning of the result 
after we find it, needs a brain. ... No amount of mathematical training and 
ability can take the place of the judgment and common sense that comes from 
a knowledge of the field in which the problem lies. 1 

In the interpretation of r one should avoid imputing any causal 
relationship between the variables. In this connection the following 
pungent remarks of Professor E. B. Wilson 2 may be appropriately 
quoted: 

Correlation is a mutual affair between two numerical variables; the correlation 
coefficient r is symmetrical with respect to them. Strictly, y is not correlated 
with x or x with y , but x and y are correlated. Theory is very important in 
indicating what facts should be looked for as significant; facts are significant 
or important largely as they indicate theory, but neither compels the other, as 
the histories of theorizing and of fact finding amply demonstrate ; . . Further, 
the value of the correlation coefficient depends on the group for which it is deter- 
mined or on the universe of which that group is a fair sample. The correlation 
coefficient r of height and weight for a group containing humans from infancy to 
adult life would be different from, and in fact greater than, the coefficient for 
college students or for the members of a football squad; there is no such thing 
as the correlation coefficient per se. 

If the student has mastered the underlying mathematical theory 
he should be able to understand and profit by the interpretations 
given by the writers in his particular field of interest. As a final 
aid in forming a conception of its meaning, we state a theorem which 
gives to r a meaning in pure chance. If x and y are affected by s 
equally likely causes of which t are common to both, then r = i/s. 

Theorem VI. An urn containing white and black balls is so main - 
tained that in drawing a ball the 'probability of getting a white ball is a 
constant p and that of getting a black ball is q (= 1 — p). The first 
drawing of a pair of drawings is to consist of s balls taken one at a time 
from the urn . The second drawing is to consist of s balls of which t are 
taken at random from the s first drawn , and s — t are drawn one at a time 
from the urn. Then the correlation coefficient between the numbers of 
white balls in the two drawings is t/s. 

As an illustration of the theorem we will take s = 5, t = 3, p ~ 

Let x be the number of white balls in the first drawing and y the 

1 Crathome in Journal of the American Statistical Association, vol. 26, 
Supplement, March, 1931, p. 27. 

2 Correlation and Association , Journal of the American Statistical Association, 
vol. 26 (1931), pp. 250-256. 
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number of white balls in the second drawing. Then Table 37, 
constructed by the theory of probability, 1 exhibits the a ■priori he- 
quencies when we use as small numbers as possible for frequencies 
subject to the condition that each frequency is to be an integer. 

Table 37 — A Priori Frequencies 


X 

0 

1 

2 

3 

4 

5 

f(y) 

5 

0 

0 

0 

9 

6 

1 

16 

4 

0 

0 

81 

108 

45 

6 

240 

3 

0 

243 

648 

432 

108 

9 

1440 

2 

243 

1620 

1728 

648 

81 

0 

4320 

1 

1458 

3159 

1620 

243 

0 

0 

6480 

0 

2187 

1458 

243 

0 

0 

0 

3888] 

m 

3888 

6480 

4320 

1440 

240 

16 

16,384 


According to the theorem the correlation coefficient should be f . 
It is left as an exercise for the student to show, by computing r from 
the table, that this is actually the case. 

Review Questions and Problems 

1. Define the following terms: statistics, variate, discrete, class interval, class 

mark, x-array of y 7 s, range, regression line, sample, universe, coefficient of 
variation, variance. 

2. Name and define five averages. Discuss their advantages and limitations. 

3. What does a ratio chart show that a chart with a uniform scale does not? If 

you wished to plot data so as to secure the effect of a ratio chart, but had 
no ratio paper available, how would you accomplish the de sir ed result? 

4. Prove the following: 

(а) The algebraic sum of the deviations of the variates from their mean 
is zero. 

(б) The second moment about an arbitrary point equals the second mo- 
ment about the mean increased by the square of the distance between 
the arbitrary point and the mean. 

1 Explained in Part II. 
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5. (a) Define and explain how to compute the following: 

* Qi) Qty Qi MD, S t cr. 

(6) In the case of a normal distribution give the value of each of the first 
four constants in ( a ) in terms of x or a. 

6. (a) Give the equation of the normal curve in both arbitrary coordinates and 

standard units. State the relation between abscissas and between ordi- 
nates in the two systems. 

(b) State the properties of the normal curve. 

7. Show how to fit a straight line y - mx + k by the method of moments by 

deriving the expressions for m and h. 

8. Show how to fit an exponential function by the method explained in the 

text. 

9. Show how to fit a parabola by the method of moments. 

10. (a) Give two of the formulas for r. Discuss the use or uses of correlation in 

any problem that occurs to you. 

(6) Show that the slope of the fine in problem 7 may be written rcr y /a z . 

11. Prove that |r| < 1. 

(b) Define the correlation ratio. Discuss its use. 

12. Discuss rank correlation. 

13. Derive the following relations: 

x ~ cu x o 
ju 2 = — n 2 

y 2:x ~ C 2 M2:u 
CT X = C<x u> 

14. The following is a reduced distribution of the breakfast checks at a cafeteria. 

Using the indirect method find x and <r x . 


X 

/ 

8-12 

4 

13-17 

8 

18-22 

24 

23-27 

21 

28-32 

15 

33-37 

14 

38-42 

7 

43-47 

4 

48-52 

2 

53-57 

1 


Am. x - 27.2 j£, <r - 9.4£ 

15. Derive the relations which give the third and fourth moments about the 

mean in terms of moments about an arbitrary origin. Define <* 3 and <* 4 . 
What information do they give? 

16. Compute the value of a 3 and of c* 4 for the distribution in Exercise 14. 
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17. The following is a distribution of the heights of students where x denotes 
heights in inches and / is the number of students of the corresponding 
heights. Find x , c x , < 23 , and a 4 . 


* / 

60.5 1 

62.0 3 

63.5 14 

65.0 32 

66.5 61 

68.0 80 

69.5 71 

71.0 35 

72.5 24 

74.0 2 

75.5 1 


18. For N values of a variable v it is known that Yl v = 0 and = N: What 

are the origin and unit of v ? 

19. Find in two ways the value of P for which the function 

v = '£/(* - p ) 2 

has the smallest value. 

20. (Walker) An algebra test was given to 400 high school children, of whom 

150 were boys and 250 were girls. The results were as follows: 


ni = 150 
Xi = 72.5 
<ri = 7.0 


ni — 250 
x 2 = 73.6 

(72 = 6.4 


Find the mean and standard deviation of the combined groups. 

21. For a normal distribution of 1500 students' grades, x = 75, <j x ~ 10. What 

values of x will include the middle 500 grades? How many grades were 
below 60; above 90? 

22. Suppose a distribution of 1000 breakfast checks from the cafeteria mentioned 

in problem 14 showed the following results: x = 27 a x « 9& a z - 0, 
o >4 = 3. On the basis of these results what is the expected frequency in 
the 23-27 f class interval? 

23. Given the following data as to the heights (y) and weights (x) of college men: 

Y,y = 6,800, = 463,025, 2>j = 1,022,250 

Y.x = 15,000, = 2,272,500, N = 100. 

Find x, y, <r x , a y , r. 

24. Derive the expression for the standard error of estimate, 

Sy = <Ty(l ~ r 2 ) l/2 . 

26. Discuss the use of Sy in predictions. 
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26. Compute the median, quartiles, and quartile deviation for the following dis- 
tribution where x = bushels per acre and / — corresponding frequency. 


X 

f 

1 

3 

3 

26 

5 

78 

7 

107 

9 

113 

11 

65 

13 

40 

15 

22 

17 

45 

19 

41 

21 

21 

23 

23 


27. (a) Find r for the following table using ( u , v) coordinates. 



17 

19 

21 

23 

f(y) 

18 


3 

2 

1 

6 

15 

2 

4 

3 

i 

10 

12 

2 

1 

1 


4 

Six) 

4 

8 

6 

2 

20 


(6) For the above data, find x, y, cr xi <r v , and the equations of the regres- 
sion lines. 

28. For Table 38, (a) find the correlation coefficient, (6) find the equations of the 

lines of regression, (c) locate the coordinate axes through the arithmetic 
mean of the table and plot the lines obtained in (b). 

29. Fit an exponential function of the type y = AeP x to the following data: 


X 

0 

2 

4 

y 

2 

10 

100 


First find the equation in the forms 

(a) Y — at + b 

(b) Y = mx + k 

and then determine A and B. 
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Table 38 — Correlation Table for Monthly Rainfall at Iowa City and 
Des Moines, 1890-1925 

IOWA CITY 


\a: 

y 

0.245 1 

0.745 

1.245 

1.745 

2.245 

2.745 

10 

s 

00 

uo 

s 

00 

lO 

3 

4.745 

(N 

LO 

d 

6.245 | 

uo 

s 

d 

7.245 | 

7.745 1 

od 

NO 

s 

00 

.9.245 

s 

C5 

10.245 

10.745 

f(y) 

10.245 



















1 




l 

9.745 
















1 







l 

9.245 

















1 

1 





2 

8.745 













1 






1 




2 

8.245 













2 


1 

1 







4 

7.745 











1 











1 

2 

7.245 










2 

1 


1 


2 

1 







7 

6.745 



2 


1 


2 


1 



1 



1 



1 

1 




10 

6.245 












1 











1 

5.745 






4 

1 


1 




1 

2 



1 






10 

5.245 



1 


1 

2 



2 


1 

fl 

I 

1 

1 




1 




12 

4.745 





2 

1 

2 

2 

1 

1 

1 

1 

1 



2 



1 




14 

4.245 



1 


1 

1 

2 


1 

4 

1 

2 

1 








■ 

■ 

14 

3.745 




4 

2 

1 

2 

6 

5 

3 

2 

2 

2 


1 






I 

i 

30 

3.245 


2 

~! 

4 

6 

6 

1 | 

7 

2 


1 






1 





i 

34 

2.745 



3 

4 

1 

8 

4 



4 

2 

1 

2 



1 









30 

2.245 



1 

5 

10 

7 

6 

4 

2 

2 


... 











37 

1.745 

1 

4 

7 

12 

13 

8 

5 

1 

1 


2 


1 










56 

1.245 

3 

8 

18 

17 

6 

8 

4 

2 


1 













67 

0.745 

6 

16 

21 

12 

6 

1 

1 

1 



1 



1 









66 

0.245 

13 

12 

3 

4 



















32 

f(x) 

23 

42 

58 

62 

49 

47 

32 

27 

18 

15 

14 

7 


5 

6 

5 

3 

2 

5 

0 

1 

1 

432 
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30. How does the scatter diagram assist one in deciding whether the regression is 

linear or non-linear? Give the formulas for the correlation coefficient 
and for the correlation ratio of y on x, explaining the meaning of the letters 
used. How would you use these indices of correlation to decide whether 
the regression of y on x is linear or non-linear? 

31. (a) In a normal distribution in which 5 = 0 and <r x = 4, what proportion 

of the data lie where x > 12? 

( b ) If 100 of the data lie between x = —6 and x = —8, how many of the data 
are there in the whole distribution? 

32. (a) When the variates are ungrouped what is perhaps the best formula 

for <r x ? Ans. 

[i\r2> - a» 2 P 

ff T as - * 

N 

(b) What does this expression become in terms of N when x refers to the 
integers from 1 to iV? 

33. (a) Expand (a -f b + c + d) 2 . 

(6) The expansion of (xi ■+- x% ~b • • • + x n ) 2 consists of the sum of the 
squares of the afs plus the sum of their products taken two at a time. 
Express this expansion in summation notation. 

34. (a) Show that the formula for MD may be written 

MD = | [2 Z ft- E fiXil 

W Xi<x Xi<X 

Hint. For %i < 5, ~ x\ = - £/*•(£* - x) = J^fi(x - Xi) - 

xY,fi - 

For Xi > 5, - *| “ - Si)- 

Since x is the centroid (§14, Chapter III), —^fiix — £») lor Xi> x equals 
— Xi) for Xi < x. 

(b) Using this formula evaluate MD for one of the distributions in the text. 
36. Given N pairs of variates: (x n , x n )', (x 12y x n )', (x Uy Xn); * • •; fem, x 2n ). 
Show that: 

(a) the mean x of all the variates is 

1 w 

® ~ ^ £ tei* + £ai), 

(b) the variance a 2 taken about the 5 in ( a ) is 

= A [£ (** - 2) 2 + E (sett - m. 

2iV X X 

Note. The quantity 

r’ == — E(*n - ®)(*si - 5) 

Na 2 i 

where x and er z are defined as in (a) and (6) is called the intra-class corre- 
lation coefficient. For its use see Statistical Methods for Research Workers , 
Fisher (§38), Oliver and Boyd, London. 
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36. Let S r = ~ X £ r * Prove that Si = N(N + l)/2, 

& = N(N + 1)(2N + l)/6, S 3 - St*. 

37. Sketch the graph of y = Ae Bx , - « < x < oo, when (a) both A and 5 are 

positive, (6) A is positive and B negative, (c) A is negative and B posi- 
tive, (d) both A and B are negative. 

38. A large number of rectangles are drawn all having the same perimeter but 

different bases (x) and altitudes (y). Which of the following is the cor- 
rect answer? The coefficient of correlation between x and y is (a) nega- 
tive and numerically large, ( b ) positive and numerically small, (c) positive 
and numerically large, (d) approximately zero. 

39. For N correlated values of x and y the regression equation of y on x is found 

to be y — 1 x. If x = 0, r ~ 0.5, and a x — 1, determine y and S y . 

40. Let NS y 2 denote the sum of squares of deviations from the line of least 

squares (Case I). 

(a) Show that NS y 2 = X?/ 2 — ni^xy — kYjy. 


Hint. NS V 2 ~ ^{y — rnx — k) 2 

= /2y(y — nix — h) — m£x(y — mx — k) 
— &X ( V — nix — k). 


The last two expressions vanish. Why? 

(6) If m and k are replaced by their determinant values 'from (5), p. 143, 
show that 



Ev* Ev E x v 
'Ey n E x . 
E x v E x E x<> 


d = 


N £x 

E* E * 2 


The third order determinant is D bordered by El/ 2 > Ev> E X V- 

( c ) If x and y are replaced by x r and y', denoting deviations from their 
respective means, find the values of the resulting determinants in ( b ). 

(d) From the results in (c) show that S y 2 — <r y 2 (l — r 2 ). 

41. Discuss the properties of the normal correlation surface and their use in 

passing judgment on the. reliability of predictions based upon the regres- 
sion line of y on x. 

42. ( For calculus students) In fitting points in a plane by a line so that the 

sum of squares of perpendicular deviations shall be a minimum, a second 
line may be found for which the sum of squares of perpendicular devia- 
tions is a maximum. If X^ 2 is the sum of squares of deviations from the 
first line and X^ 2 * s the sum °f squares of deviations from the second line, 
show that X<2 2 /I> = (1 +r)/( 1 — r). [Reference: Bulletin Ameri- 
can Mathematical Society , vol. 47 (1941), p. 710.] 
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Tables 

I. Ordinates and Areas op the Normal Curve. 

II. Common Logarithms of Numbers to Five Decimal Places. 





Table I . Ordinates and Areas of the Normal Curve, cf >( t ) = — 

V2x 


t 

< t > ( t ) 

fo'tWt 

t 

■HO 


t 

■HO 


.00 

.39894 

.00000 

.45 

.36053 

. 17364 

.90 

.26609 

.31594 

.01 

.39892 

.00399 

.46 

.35889 

. 17724 

.91 

.26369 

.31859 

.02 

.39886 

.00798 

.47 

.35723 

. 18082 

.92 

.26129 

.32121 

.03 

.39876 

.01197 

.48 

.35553 

. 18439 

.93 

.25888 

.32381 

.04 

.39862 

.01595 

.49 

.35381 

. 18793 

.94 

.25647 

.32639 

.05 

.39844 

.01994 

.50 

.35207 

. 19146 

.95 

.25406 

.32894 

.06 

.39822 

.02392 

.51 

.35029 

. 19497 

.96 

.25164 

.33147 

.07 

.39797 

.02790 

.52 

.34849 

. 19847 

.97 

.24923 

.33398 

.08 

.39767 

.03188 

.53 

.34667 

.20194 

.98 

.24681 

.33646 

.09 

.39733 

.03586 

.54 

.34482 

.20540 

.99 

.24439 

.33891 

.10 

.39695 

.03983 

.55 

.34294 

.20884 

1.00 

.24197 

.34134 

.11 

.39654 

.04380 

.56 

.34105 

.21226 

1.01 

.23955 

.34375 

.12 

.39608 

.04776 

.57 

.33912 

.21566 

1.02 

.23713 

.34614 

.13 

.39559 

.05172 

.58 

.33718 

.21904 

1.03 

.23471 

.34850 

.14 

.39505 

.05567 

.59 

.33521 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 

.60 

.33322 

.22575 

1.05 

.22988 

.35314 

.16 

.39387 

.06356 

.61 

.33121 

.22907 

1.06 

.22747 

.35543 

.17 

.39322 

.06749 

.62 

.32918 

.23237 

1.07 

.22506 

.35769 

.18 

.39253 

.07142 

.63 

.32713 

.23565 

1.08 

.22265 

.35993 

.19 

.39181 

.07535 

.64 

.32506 

.23891 

1.09 

.22025 

.36214 

.20 

.39104 | 

.07926 

.65 

.32297 

.24215 

1.10 

.21785 

.36433 

.21 

.39024 

.08317 

.66 

.32086 

.24537 

1.11 

.21546 

.36650 

.22 

.38940 

.08706 

.67 

.31874 

.24857 

1.12 

.21307 

.36864 

.23 

.38853 

.09095 

.68 

.31659 

.25175 

1.13 

.21069 

.37076 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 

.09871 

.70 

.31225 

.25804 

1.15 

.20594 

.37493 

.26 

.38568 

. 10257 

.71 

.31006 

.26115 

1.16 

.20357 

.37698 

.27 

.38466 

. 10642 

.72 

.30785 

.26424 

1.17 

.20121 

.37900 

.28 

.38361 

.11026 

.73 

.30563 

.26730 

1.18 

1 .19886 

.38100 

.29 

.38251 

. 11409 

.74 

.30339 

.27035 

1.19 

. 19652 

.38298 

.30 

.38139 

. 11791 

.75 

.30114 

.27337 

1.20 

. 19419 

.38493 

.31 

.38023 

. 12172 

.76 

.29887 

.27637 

1.21 

.19186 

i .38686 

.32 

.37903 

. 12552 

.77 

.29659 

.27935 

1.22 

. 18954 

.38877 

.33 

.37780 

. 12930 

.78 

.29431 

.28230 

1.23 

. 18724 

.39065 

.34 

.37654 

.13307 

.79 

.29200 

.28524 

1.24 

. 18494 

.39251 

.35 

.37524 

. 13683 

.80 

.28969 

.28814 

1.25 

.18265 

.39435 

.36 

.37391 

. 14058 

.81 

.28737 

.29103 

1.26 

. 18037 

.39617 

.37 

.37255 

.14431 

.82 

.28504 

.29389 

1.27 

. 17810 

.39796 

.38 

.37115 

.14803 

.83 

.28269 

.29673 

1.28 

. 17585 

.39973 

.39 

.36973 

. 15173 

.84 

.28034 

.29955 

1.29 

. 17360 

.40147 

.40 

.36827 

. 15542 

.85 

.27798 

.30234 

1.30 

. 17137 

.40320 

.41 

.36678 

. 15910 

.86 

.27562 

.30511 

1.31 

. 16915 

.40490 

.42 

.36526 

. 16276 

.87 

.27324 

.30785 

1.32 

. 16694 

.40658 

.43 

.36371 

. 16640 

.88 

.27086 

.31057 

1.33 

. 16474 

.40824 

. .44 

.36213 

. 17003 

.89 

.26848 

.31327 

1.34 

. 16256 

.40988 
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Table I. 


Ordinates and Areas of the Normal Curve, < f >( t ) ~ 


-~= e-*l 2 
V 2 t 


t 

4 >( t ) 


t 


fo ‘ 4 >( t)di 

t 



1.35 

. 16038 

.41149 

1.80 

.07895 

.46407 

2.25 

.03174 

. 48778 

1.36 

. 15822 

.41309 

1.81 

.07754 

.46485 

2.26 

.03103 

.48809 

1.37 

. 15608 

.41466 

1.82 

.07614 

.46562 

2.27 

.03034 

.48840 

1.38 

. 15395 

.41621 

1.83 

.07477 

.46638 

2.28 

. 02965 

.48870 

1.39 

.15183 

.41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 

1.40 

. 14973 

.41924 

1.85 

.07206 

.46784 

2.30 

.02833 

.48928 

1.41 

. 14764 

.42073 

1.86 

.07074 

.46856 

2.31 

. 02768 

.48956 

1.42 

. 14556 

.42220 

1.87 

.06943 

.46926 

2.32 

. 02705 

.48983 

1.43 

. 14350 

.42364 

1.88 

.06814 

.46995 

2.33 

.02643 

.49010 

1.44 

.14146 

.42507 

1.89 

.06687 

.47062 

2.34 

.02582 

.49036 

1.45 

.13943 

.42647 

1.90 

.06562 

.47128 

2.35 

.02522 

.49061 

1.46 

. 13742 

.42786 

1.91 

.06439 

.47193 

2.36 

. 02463 

.49086 

1.47 

.13542 

.42922 

1.92 

.06316 

.47257 

2.37 

. 02406 

.49111 

1.48 

. 13344 

.43056 

1.93 

.06195 

.47320 

2.38 

.02349 

.49134 

1.49 

.13147 

.43189 

1.94 

.06077 

.47381 

2.39 

. 02294 

.49158 

1.50 

.12952 

.43319 

1.95 

. 05959 

.47441 

2.40 

.02239 

.49180 

1.51 

.12758 

.43448 

1.96 

.05844 

.47500 

2.41 

.02186 

.49202 

1.52 

. 12566 

.43574 

1.97 

. 05730 

.47558 

2.42 

.02134 

.49224 

1.53 

. 12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.54 j 

.12188 

.43822 

1.99 

.05508 

.47670 

2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2.00 

.05399 

.47725 

2.45 

.01984 

.49286 

1.56 l 

.11816 

.44062 

2.01 

. 02592 

.47778 

2.46 

.01936 i 

.49305 

1.57 

.11632 

.44179 

2.02 

.05186 

.47831 

2.47 

.01889 

.49324 

1.58 

. 11450 

.44295 

2.03 

. 05082 

.47882 

2.48 

.01842 

.49343 

1.59 

.11270 

.44408 

2.04 

.04980 

! .47932 

2.49 

.01797 

.49361 

1.60 

.11092 

.44520 

2.05 

. 04879 

.47982 

2.50 

.01753 

.49379 

1.61 

. 10915 

.44630 

2.06 

.04780 

! .48030 

2.51 

.01709 

.49396 

1.62 

.10741 

.44738 

2.07 

. 04682 

.48077 

2.52 

.01667 

.49413 

1.63 

. 10567 

.44845 

2.08 

. 04586 

.48124 

2.53 

.01625 

.49430 

1.64 

. 10396 

.44950 

2.09 

.04491 

.48169 

2.54 

.01585 

.49446 

1.65 

! .10226 

.45053 

2.10 

.04398 

.48214 

2.55 

.01545 

.49461 

1.66 

.10059 

.45154 

2.11 

.04307 

.48257 

2.56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

.04217 

.48300 

2.57 

.01468 

.49492 

1. P 8 

.09728 

.45352 

2.13 

. 04128 

.48341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2.14 

.04041 

.48382 

2.59 

.01394 

.49520 

1.70 

.09405 

.45543 

2.15 

.03955 

.48422 

2.60 

' .01358 

.49534 

1.71 

.09246 

.45637 

2.16 

.03871 

.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

.01289 

.49560 

1.73 

.08933 

.45818 

2.18 

.03706 

.48537 

2.63 

. 01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75 

.08628 

.45994 

2.20 

.03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

.46080 

2.21 

.03470 

.48645 

2.66 

.01160 

.49609 

1.77 

.08329 

.46164 

2.22 

.03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

.46246 

2.23 

.03319 

.48713 

2.68 

.01100 

.49632 

1.79 

.08038 

..46327 

2.24 

.03246 

.48745 

2.69 

.01071 

.49643 
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Table I. Ordinates and Areas of the Normal Curve, <f>(t) = ~~ e-* 2 /2 

v 2tt 


t 

<Kt ) 

fQ l 4>(t)dt 

t 

<K«) 

fo 4>{t)dt 

t 

4>{ t ) 


2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2.71 

.01014 

. 49664 

3.16 

.00271 

.49921 

3.61 

. 00059 

.49985 

2.72 

.00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2.73 

.00961 

. 49683 

3.18 

.00254 

.49926 

3.63 

. 00055 

.49986 

2.74 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

.49986 

2.75 

.00909 

.49702 

3.20 

.00238 

.49931 

3.65 

.00051 

.49987 

2.76 

.00885 

.49711 

3.21 

.00231 

.49934 

3.66 

.00049 

.49987 

2.77 

.00861 

.49720 

3.22 

.00224 

.49936 

3.67 

.00047 

.49988 

2.78 

.00837 

.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2.79 

.00814 

.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 

.00792 

.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2.81 

.00770 

.49752 

3.26 

.00196 

.49944 

3.71 

.00041 

.49990 

2.82 

.00748 

.49760 

3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2.83 

.00727 

.49767 

3.28 

.00184 

.49948 

3.73 

. 00038 

.49990 

2.84 

. 00707 

.49774 

3.29 

.00178 

.49950 

3.74 

. 00037 

.49991 

2.85 

.00687 

.49781 

3.30 

.00172 

.49952 

3.75 

. 00035 

.49991 

2.86 

.00668 

.49788 

3.31 

.00167 

.49953 

3.76 

.00034 

.49992 

2.87 

.00649 

.49795 

3.32 

.00161 

.49955 

3.77 

. 00033 

.49992 

2.88 

.00631 

.49801 

3.33 

.00156 

.49957 

3.78 

.00031 

.49992 

2.89 

.00613 

.49807 

3.34 

.00151 

.49958 

3.79 

. 00030 

.49992 

2.90 

.00595 

.49813 

3.35 

.00146 

.49960 

3.80 

.00029 

.49993 

2.91 

.00578 

.49819 

3.36 

.00141 

.49961 

3.81 

.00028 

.49993 

2.92 

. 00562 

.49825 

3.37 

.00136 

.49962 

3.82 

.00027 

.49993 

2.93 

. 00545 

.49831 

3.38 

.00132 

.49964 

3.83 

.00026 

.49994 

2.94 

.00530 

.49836 

3.39 

.00127 

.49965 

3.84 

.00025 

.49994 

2.95 

.00514 

.49841 

3.40 

.00123 

.49966 

3.85 

.00024 

.49994 

2.96 

.00499 

.49846 

3.41 

.00119 

.49968 

3.86 

.00023 

.49994 

2.97 

. 00485 

.49851 

3.42 

.00115 

.49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

.49971 

3.89 

.00021 

.49995 . 

3.00 

.00443 

.49865 

3.45 

.00104 

.49972 

3.90 

.00020 

.49995 

3.01 

. 00430 

.49869 

3.46 

.00100 

.49973 

3.91 

.00019 

.49995 

3.02 

. 00417 

.49874 

3.47 

. 00097 

.49974 

3.92 

.00018 

.49996 

3.03 

.00405 

.49878 

3.48 

.00094 

.49975 

3.93 

.00018 

.49996 

3.04 

. 00393 

.49882 

3.49 

.00090 

.49976 

3.94 

.00017 

.49996 

3.05 

.00381 

.49886 

3.50 

.00087 

.49977 

3.95 

.00016 

.49996 

3.06 

.00370 

.49889 

3.51 

.00084 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

.49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

.00079 

.49979 

3.98 

.00014 

.49997 

3.09 

.00337 

.49900 

3.54 

.00076 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 

.00307 

.49910 

3.57 

.00068 

.49982 




3.13 

.00298 

.49913 

3.58 

.00066 

.49983 




3.14 

.00288 

.49916 

3.59 

.00063 

.49983 
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Table II. Common Logarithms op Numbers to Five Decimal Places 


IB 


1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts || 



043 

a 

130 

173 

217 

260 

303 

346 

389 



01 

432 

475 

518 

561 

604 

647 

689 

732 

775 

817 


44 4 $ 4.2 

02 


903 

945 

988 


*072 

*115 

*157 

*199 

*242 

1 


03 


326 

368 


452 

494 

536 

578 

m 

662 

2 

4.4 4,3 4.2 
8.8 8.6 8.4 

04 


745 

787 

828 

870 

912 

953 

995 

*036 

*078 

3 

4 

13.2 12.9 12.6 
17.6 17 2 16 8 

05 

02 119 

160 

202 

243 

284 

325 

366 

407 

449 

490 

5 

22.0 21.5 21.0 

06 

531 

572 

612 

653 

694 

735 

776 

816 

857 

898 

6 

26.4 25.8 25.2 

07 

Hi 

979 

*019 

*060 

STBI 

*141 

*181 

*222 

*262 

*302 

7 

8 

30.8 30.1 29.4 
35.2 34.4 33.6 

08 

03 342 

383 

423 

463 


543 

583 

623 

663 

703 

9 

39.6 38.7 37.8 

09 

03 743 

782 

822 

862 


941 

981 

*021 

ggjl 

U m 



■ 

04 139 

179 

218 

258 

297 

336 

376 

415 

454 

493 



11 

532 

571 

610 

650 

689 

727 

766 

805 

844 

883 



12 

04 922 

961 

999 

*038 

*077 

*115 

*154 

*192 

*231 

*269 


41 49 39 

13 

05 308 

346 

385 

423 

461 

500 

538 

576 

614 

652 

1 

4.1 4 3.9 












2 

8.2 8 7.8 

14 

05 690 

729 

767 

805 

843 

881 

918 

956 

994 

*032 

3 

12.3 12 11.7 

15 

06 070 

108 

145 

183 

221 

258 

296 

333 

371 

408 

4 

16.4 16 15.6 

16 

446 

483 

521 

558 

595 

633 

670 

707 

744 

781 

5 

« 

20.5 20 19.5 

24.6 24 23.4 

17 

06 819 

856 

893 

930 

967 

*004 

*041 

*078 

*115 

*151 

I 

28.7 28 27.3 

18 

07 188 

225 

262 

298 

335 

372 

408 

445 

482 

518 

9 

36 9 36 35 1 

19 

555 

591 

628 

664 

700 

737 

773 

809 

846 

882 



H 

07 918 

rm 

1m 

Bjssi 




mi 

*207 

*243 



21 

08 279 

314 

350 

386 

422 

458 

493 

529 

565 

EEI 



22 

636 

672 

707 

743 

778 

814 

849 

884 

920 

955 


do 37 36 

23 

08 991 

*026 

*061 

*096 

*132 

*167 

*202 

*237 

*272 

*307 

1 

2 

5.8 0.7 O.b 
7.6 7.4 7.2 

24 

09 342 

377 

412 

447 

482 

517 

552 

587 

621 

656 

3 

11.4 11.1 10.8 

25 

09 691 

726 

760 

795 

830 

864 

899 

934 

968 


5 

15.2 14.8 14.4 
l q n Ian 

26 

10 037 

072 

106 

140 

175 

209 

243 

278 

312 

346 

9 

22.8 22.2 21.6 

27 

380 

415 

449 

483 

517 

551 

585 

619 

653 

687 

7 

8 , 

26.6 25.9 25.2 
30.4 29.6 28.8 

28 

10 721 

755 

789 

823 

857 

890 

924 

958 

992 

*025 

9 1 

34.2 33.3 32.4 

29 

11059 

093 

126 

160 

193 

227 

261 

294 

327 

361 



130 

394 

428 

461 

494 

528 

561 

594 

628 

661 

694 



31 

11 727 

760 

793 

826 

860 

893 

926 

959 

992 

*024 


35 34 33 

32 

12 057 

090 

123 

156 

189 

222 

254 

287 

320' 

352 

-a 

w wx vt. 

•z C x A "Z "Z 

33 

385 

418 

450 

483 

516 

548 

581 

613 

646 

678 

JL 

2 

0.0 0.4 0.0 

7.0 6.8 6.6 












3 

10.5 10.2 9.9 

34 

12 710 

743 

775 

808 

840 

872 

905 

937 

969 

*001 

4 

14 0 13 6 13 2 

35 

13 033 

066 

098 

130 

162 

194 

226 

258 

290 

322 

5 

17^5 17.0 1 6.5 

36 

354 

386 

418 

450 

481 

513 

545 

577 

609 

640 

6 

21.0 20.4 19.8 












7 

24.5 23.8 23.1 

37 

672 

704 

735 

767 

799 

830 

862 

893 

925 

956 

8 1 

28.0 27.2 26.4 

38 

13 988 

*019 

*051 

*082 

*114 

*145 

*176 

*208 

*239 

*270 

•1 

31.5 30.6 29.7 

39 

14 301 

333 

364 

395 

426 

457 

489 

520 

551 

582 



140 

613 

644 

675 

706 

737 

768 

799 

829 

860 

891 



41 

14 922 

953 

983 

*014 

*045 

*076 

*106 

*137 

*168 

*198 


32 31 30 

42 

15 229 

259 

290 

320 

351 

381 

412 

442 

473 

503 

-f 

W/V Wv 

43 

534 

564 

594 

625 

655 

685 

715 

746 

776 

806 

JL 

2 

3.2 3.1 3 

6.4 6.2 6 

44 

15 836 

866 

897 

927 

957 

987 

*017 

*047 

*077 

*107 

3 

4 

9.6 9.3 9 

12 8 12 4 12 

45 

16 137 

167 

197 

227 

256 

286 

316 

346 

376 

406 

5 

16.0 15.5 15 

46 

435 

465 

495 

524 

554 

584 

613 

643 

673 

702 

6 

19.2 18.6 18 












7 

22.4 21.7 21 

47 

16 732 

761 

791 

820 

850 

879 

909 

938 

967 

997 

8 

25.6 24.8 24 

48 

17 026 

056 

085 

114 

143 

173 

202 

231 

260 

289 

9 

28.8 27.9 27 

49 

319 

348 

377 

406 

435 

464 

493 ! 

522 

551 

580 



150 

17 609 

638 

667 

696 

725 

754 

782 

811 

840 

869 



N 

O 

i 
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3 

4 

5 

6 

7 

8 

9 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


29 

28 

2.9 

2.8 

5.8 

5.6 

8.7 

8.4 

11.6 

11.2 

14.5 

14.0 

17.4 

16.8 

20.3 

19.6 

23.2 

22.4 

26.1 

25.2 


27 26 

1 2.7 2.6 

2 5.4 5.2 

3 8.1 7.8 

4 10.8 10.4 

5 13.5 13.0 

6 16.2 15.6 

7 18.9 18.2 

8 21.6 20.8 
$ 24.3 23.4 


mu 

! 17 609 

638 

667 

696 

725 

754 

782 

811 

840 

869 

51 

17 898 

926 

955 

984 

*013 

*041 

*070 

*099 

*127 

*156 

52 

18 184 

213 

241 

270 

298 

327 

355 

384 

412 

441 

53 

469 

498 

526 

554 

583 

611 

639 

667 

696 

724 

54 

18 752 

780 

808 

837 

865 

893 

921 

949 

977 

*005 

55 

19 033 

061 

089 

117 

145 

173 

201 

229 

257 

285 

56 

312 

340 

368 

396 

424 

451 

479 

507 

535 

562 

57 

590 

618 

645 

673 

700 

728 

756 

783 

811 

838 

58 

19 866 

893 

921 

948 

976 

*003 

*030 

*058 

*085 

*112 

59 

20 140 

167 

194 

222 

249 

276 

303 

330 

358 

385 


61 

683 

710 

737 

763 

790 

817 

844 

871 

898 

925 

62 

20 952 

978 

*005 

*032 

*059 

*085 

*112 

*139 

*165 

*192 

63 

21 219 

245 

272 

299 

325 

352 

378 

405 

431 

458 

64 

484 

511 

537 

564 

590 

617 

643 

669 

696 

722 

65 

21 748 

775 

801 

827 

854 

880 

906 

932 

958 

985 [ 

66 

22 011 

037 

063 

089 

115 

141 

167 

194 

220 

246 

67 

272 

298 

324 

350 

376 

401 

427 

453 

479 

505 

68 

531 

557 

583 

608 

634 

660 

686 

712 

737 

763 

69 

22 789 

* 814 

840 

866 

891 

917 

943 

968 

994 

*019 

170 


172 

198 

223 

249 

274 


300 

325 

350 

376 

401 i 

555 

578 

603 

629 

654 

23 805 

830 

855 

880 

905 

24 055 

080 

105 

130 

155 

304 

329 

353 

378 

403 

551 

576 

601 

625 

650 

24 797 

822 

846 

871 

895 

25 042 

066 

091 

115 

139 

285 

310 

334 

358 

382 

527 

to.4iferjjfa.Mi.-yi 



24 23 
1 2.4 2.3 

3 4.8 4.6 

3 7.2 6.9 

4 9.6 9.2 

5 12.0 11.5 
© 14.4 13.8 

7 16.8 16.1 

8 19.2 18.4 
© 21.6 20.7 



81 

25 768 

792 

816 

840 

864 

888 

912 

935 

959 

983 

82 

26 007 

031 

055 

079 

102 

126 

150 

174 

198 

221 

83 

245 

269 

293 

316 

340 

364 

387 

411 

435 

458 

84 

482 ] 

505 

529 

553 

576 

600 

623 

647 

670 

694 

85 

717 

741 

764 

788 

811 

834 

858 

881 

905 

928 

86 

26 951 

975 

998 

*021 

*045 

*068 

*091 

*114 

*138 

*161 

87 

27 184 

207 

231 

254 

277 

300 

323 

346 

370 

393 

88 

416 

439 

462 

485 

508 

531 

554 

577 

600 

623 

89 

646 

669 

692 

715 

738 

761 

784 | 

807 

830 

852 












vga 

27 875 

898 

921 

944 

967 

989 

*012 

*035 

*058 

*osT 

91 

28 103 

126 

149 

171 

194 

217 

240 

262 

285 

307 

92 

330 

353 

375 

398 

421 

443 

466 

488 

511 

533 

93 

556 

578 

601 

623 

646 

668 1 

691 

713 

735 

758 

94 

28 780 

803 

825 

847 

870 

892 

914 

937 

959 

981 

95 

29 003 

026 

048 

070 

092 

115 

137 

159 

181 

203 

96 

226 

248 

270 

292 

314 

336 

358 

380 

403 

425 

97 

447 

469 

491 

513 

535 

557 

579 

601 

623 

645 

98 

667 

688 

710 

732 

754 

776 

798 

820 

842 

863 

99 

29 885 

907 

929 

951 

973 

994 

*016 

*038 

*060 

*081 

m 

9 

I 

1 

1 

I 

233 

255 

276 

298 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


8 4 5 6 7 8 9 


30 103 125 146 168 190 211 233 255 276 298 

01 320~ 341 363 384 406 428 449 471 492 514 

02 535 557 578 600 621 643 664 685 707 728 

03 750 771 792 814 835 856 878 899 920 942 

04 30 963 984 *006 *027 *048 *069 *091 *112 *133 *154 

05 31 175 197 218 239 260 281 302 323 345 366 

06 387 408 429 450 471 492 513 534 555 576 

07 597 618 639 660 681 702 723 744 765 785 

08 31 806 827 848 869 890 911 931 952 973 994 

09 32 015 035 056 077 098 118 139 160 181 201 

222" 243 263 284 305 325 346 366 387 408 

11 428~ 449 469 490 510 531 552 572 593 613 

12 634 654 675 695 715 736 756 777 797 818 

13 32 838 858 879 899 919 940 960 980 *001 *021 

14 33 041 062 082 102 122 143 163 183 203 224 

15 244 264 284 304 325 345 365 385 405 425 

16 445 465 486 506 526 546 566 586 606 626 

17 646 666 686 706 726 746 766 786 806 826 

18 33 846 866 885 905 925 945 965 985 *005 *025 

19 34 044 064 084 104 124 143 163 183 203 223 

220 242 262 1>82~ 301 321 341 361 380 400 420 

_ __ ^ — 49g 51g 537 557 577 596 616 

22 635 655 674 694 713 733 753 772 792 811 

23 34 830 850 869 889 908 928 947 967 986 *005 

24 35 025 044 064 083 102 122 141 160 180 199 

25 218 238 257 276 295 315 334 353 372 392 

26 411 430 449 468 488 507 526 545 564 583 

27 603 622 641 660 679 698 717 736 755 774 

28 793 813 832 851 870 889 908 927 946 965 

29 35 984 *003 *021 *040 *059 *0 78 *097 *116 *135 *154 

230 36 173 192 211 229 248 267 286 305 324 342 

_ _____ ^ 418 436 455 474 493 511 530 

32 549 568 586 605 624 642 661 680 698 717 

33 736 754 773 791 810 829 847 866 884 903 

34 36 922 940 959 977 996 *014 *033 *051 *070 *088 

35 37 107 125 144 162 181 199 218 236 254 273 

36 291 310 328 346 365 383 401 420 438 457 

37 475 493 511 530 548 566 585 603 621 639 

38 658 676 694 712 731 749 767 785 803 822 

39 37 840 858 876 894 912 931 949 967 985 *003 


41 202 220 238 256 274 292 310 328 346 364 

42 382 399 417 435 453 471 489 507 525 543 

43 561 578 596 614 632 650 668 686 703 721 

44 739 757 775 792 810 828 846 863 881 899 

45 38 917 934 952 970 987 *005 *023 *041 *058 *076 

46 39 094 111 129 146 164 182 199 217 235 252 

47 270 287 305 322 340 358 375 393 410 428 

48 445 463 480 498 515 533 550 568 585 602 

49 620 637 655 672 690 707 724 742 759 777 


Prop. Parts 



22 

21 

2.2 

2.1 

4.4 

4.2 

6.6 

6.3 

8.8 

8.4 

11.0 

10.5 

13.2 

12.6 

15.4 

14.7 

17.6 

16.8 

19.8 

18.9 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


| Prop. Parts 

m 

0 

1 

2 

m 

wm 

5 

6 

7 

8 

9 



250 

39 794 

811 

829 

846 

863 

881 

898 

915 

933 

950 



51 

39 967 

985 

*002 

*019 

*037 

*054 

*071 

*088 

*106 

*123 


18 

52 

40 140 

157 

175 

192 

209 

226 

243 

261 

278 

295 

1 

1.8 

53 

312 

329 

346 

364 

381 

398 

415 

432 

449 

466 

2 

3.6 












3 

5.4 

54 

483 

500 

518 

535 

552 

569 

586 

603 

620 

637 

4 

7.2 

55 

654 

671 

688 

705 

722 

739 

756 

773 

790 

807 

5 

6 

9.0 

10.8 

56 

824 

841 

858 

875 

892 

909 

926 

943 

960 

976 

7 

12.6 

57 

40 993 

*010 

*027 

*044 

*061 

*078 

*095 

*111 

*128 

*145 

8 

§ 

]6~2 

58 

41 162 

179 

196 

212 

229 

246 

263 

280 

296 

313 



59 

330 

347 

363 

380 

397 

414 

430 

447 

464 

481 



260 

497 

514 

531 

547 

564 

581 

597 

614 

631 

647 



61 

664 

681 

697 

714 

731 

747 

764 

780 

797 

814 


17 

62 

830 

847 

863 

880 

896 

913 

929 

946 

963 

979 

1 

1.7 

63 

41 996 

*012 

*029 

*045 

*062 

*078 

*095 

*m 

*127 

*144 

2 

3.4 












3 

5.1 

64 

42 160 

177 

193 

210 

226 

243 

259 

275 

292 

308 

4 : 

6.8 

65 

325 

341 

357 

374 

390 

406 

423 

439 

455 

472 

5 

6 

8.5 

10.2 

66 

488 

504 

521 

537 

553 

570 

586 

602 

619 

635 

7 

11.9 

67 

651 

667 

684 

700 

716 

732 

749 

765 

781 

797 

8 

3 

13.6 

15.3 

68 

813 

83 0 

846 

862 

878 

894 

911 

927 

943 

959 


69 

42 975 

991 

*008 

*024 

*040 

*056 

*072 

*088 

*104 

*120 



270 

43 136 

152 

169 

185 

201 

217 

233 

249 

265 

281 



71 

297 

313 

329 

345 

361 

377 

393 

409 

425 

441 


16 

72 

457 

473 

489 

505 

521 

537 

553 

569 

584 

600 

1 

1.6 

73 

616 

632 

648 

664 

680 

696 

712 

727 

743 

759 

3 

3.2 












3 

4.8 

74 

775 

791 

807 

823 

838 

854 

870 

886 

902 

917 

4 

6.4 

75 

43 933 

949 

965 

981 

996 

*012 

*028 

*044 

*059 

*075 

5 

6 

8.0 

9.6 

76 

44 091 

107 

122 

138 

154 

170 

185 

201 

217 

232 

7 

<2 

11.2 

1 o ft 

77 

248 

264 

279 

295 

311 

326 

342 

358 

373 

389 

O 

3 

JLZ.O 

14.4 

78 

404 

420 

436 

! 451 

467 

483 

498 

514 

529 

545 



79 

560 

576 

592 

607 

623 

638 

654 

669 

685 

700 



280 


mi 

m 

m 

m 

m 

m 

m 

wm 

mjM 



81 

44 871 

886 

902 

917 

932 

948 

963 

979 

994 

*010 


15 

82 

45 025 

040 

056 

071 

086 

102 

117 

133 

148 

163 

1 

1.5 

83 

179 

194 

209 

225 

240 

255 

271 

286 

301 

317 

3 

3.0 












3 

4.5 - 

84 

332 

347 

362 

378 

393 

408 

423 

439 

454 

469 

4 

6.0 

85 

484 

500 

515 

530 

545 

561 

576 

591 

606 

621 

5 

6 

7.5 

9.0 

86 

637 

652 

667 

682 

697 

712 

728 

743 

758 

773 

7 

10.5 

87 

788 

803 

818 

834 

849 

864 

879 

894 

909 

924 

8 

9 

12.0 

13.5 

88 

45 939 

954 

969 

984 

*000 

*015 

*030 

*045 

*060 

*075 


89 

46 090 

105 

120 

135 

150 

165 

180 

195 

21C 

225 




240 

255 

270 

285 

m 

315 

ml 

345 

359 

374 



91 

389 

404 

419 

434 

449 

464 

479 

494 

509 

523 


14 

92 

538 

553 

568 

583 

598 

613 

627 

642 

657 

672 

1 

1.4 

93 

687 

702 

716 

731 

746 

761 

776 

790 

805 

820 

3 

2.8 












3 

4.2 

94 

835 

850 

864 

879 

894 

909 

923 

938 

953 

967 

4 

5.6 

95 

46 982 

997 

*012 

*026 

*041 

*056 

*070 

*085 

*100 

*114 

5 

6 

7.0 

8.4 

96 

47 129 

144 

159 

173 

188 

202 

217 

232 

246 

261 

7 

O 

9.8 

n o 

97 

276 

290 

305 

319 

334 

349 

363 

378 

392 

407 

» 

9 

ll.Z 

12.6 

98 

422 

436 

451 

465 

480 

494 

509 

524 

538 

553 



99 

567 

582 

596 

611 

625 

640 

654 

669 

683 

698 



(jjjjfl 

47 712 

727 

741 

756 

770 

784 

799 

813 1 

GO 

00 

842 

|| Prop. Parts 
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0 

1 

2 

3 

4 

5 

6 

7 

8 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 


47 712 727 741 756 770 784 799 815 828 842 

01 47 857~ 871 885 900 914 929 943 958 972 986 

02 48 001 015 029 044 058 073 087 101 116 130 j 

03 144 159 173 187 202 216 230 244 259 273 

04 287 302 316 330 344 359 373 387 401 416 

05 430 444 458 473 487 501 515 530 544 558 

06 572 586 601 615 629 643 657 671 686 700 

07 714 728 742 756 770 785 799 813 827 841 

08 855 869 883 897 911 926 940 954 968 982 

09 48 996 *010 *024 *038 *052 *066 *080 *094 *108 *122 


310 1 49 136 150 164 178 192 206 220 234 248 262 


11 276 290 304 318 332 346 

12 415 429 443 457 471 485 

13 554 568 582 596 610 624 


360 374 388 402 
499 513 527 541 
638 65 1 665 679 


14 693 707 721 734 748 762 776 790 803 817 

15 831 845 859 872 886 900 914 927 941 955 

16 49 969 982 996 *010 *024 *037 *051 *065 *079 *092 

17 50 106 120 133 147 161 174 188 202 215 229 

18 243 256 270 284 297 311 325 338 352 365 

19 379 393 406 420 433 447 461 474 488 501 

320 51 ~5 529 542 556 569 583 596 610 623 637 

In 651~ 664 678 691 705 718 732 745 759 772 

22 786 799 813 826 840 853 866 880 893 907 

23 50 920 934 947 961 974 987 *001 *014 *028 *041 

24 51055 068 081 095 108 121 135 148 162 175 

25 188 202 215 228 242 255 268 282 295 308 

26 322 335 348 362 375 388 402 415 428 441 

27 455 468 481 495 508 521 534 548 561 574 

28 587 601 614 627 640 654 667 680 693 706 

29 720 733 746 759 772 786 799 812 825 838 


851 865 878 891 904 917 930 943 957 970 


31 51 983 996 *009 *022 *035 *048 *061 *075 *088 *101 

32 52 114 127 140 153 166 179 192 205 218 231 

33 244 257 270 284 297 310 323 336 349 362 

34 375 388 401 414 427 440 453 466 479 492 

35 504 517 530 543 556 569 582 595 608 621 

36 634 647 660 673 686 699 711 724 737 750 

37 763 776 789 802 815 827 840 853 866 879 

38 52 892 905 917 930 943 956 969 982 994 *007 

39 53 020 033 046 058 071 084 097 110 122 135 


340 148 161 173 186 199 212 224 237 250 263 

~41 275~ 288 301 314 326 339 352 364 377 390 

42 403 415 428 441 453 466 479 491 504 517 

43 529 542 555 567 580 593 605 618 631 643 

44 656 668 681 694 706 719 732 744 757 769 

45 782 794 807 820 832 845 857 870 882 895 

46 53 908 920 933 945 958 970 983 995 *008 *020 

47 54 033 045 058 070 083 095 108 120 133 145 

48 158 170 183 195 208 220 233 245 258 270 

49 283 295 307 320 332 345 357 370 382 394 


54 407 419 432 444 456 469 481 494 506 518 
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Table II. Common Logarithms of Numbers to Five Decimal Places 



j Prop. Parts 


13 

1 

1.3 

2 

2.6 

3 

3.9 

4 

5.2 

5 

6.5 

6 

7.8 

7 

9.1 

8 

10.4 

9 

11.7 


12 

1 

1.2 . 

2 

2.4 

3 

3.6 

4 

4.8 

5 

6.0 

6 

7.2 

7 

8.4 

8 

9.6 

9 

| 10.8 


11 

1 

1.1 

2 

2.2 

3 

A 

3.3 



■Prop. Parts 


54 407 419 

432 

444 

456 

469 

481 

494 

506 

00 

531 543 

555 

568 

580 

593 

605 

617 

630 

642 

654 667 

679 

691 

704 

716 

728 

741 

753 

765 

777 790 

802 

814 

827 

839 

851 

864 

876 

888 

54 900 913 

925 

937 

949 

962 

974 

986 

998 

*011 

55 023 035 

047 

060 

072 

084 

096 

108 

121 

133 

145 157 

169 

182 

194 

206 

218 

230 

242 

255 

267 279 

291 

303 

315 

328 

340 

352 

364 

376 

388 400 

413 

425 

437 

449 

461 

473 

485 

497 

509 522 

534 

546 

558 

570 

582 

594 

606 

618 

630 642 

654 

666 

678 

691 

703 

715 

727 

739 

751 763 

775 

787 

799 

811 

823 

835 

847 

859 

871 883 

895 

907 

919 

931 

943 

955 

967 

979 

55 991 *003 

*015 

*027 

*038 

*050 

*062 

*074 

*086 

*098 

56 110 122 

134 

146 

158 

170 

182 

194 

205 

217 

229 241 

253 

265 

277 

289 

301 

312 

324 

336 

348 360 

372 

384 

396 

407 

419 

431 

443 

455 

467 478 

490 

502 

514 

526 

538 

549 

561 

573 

585 597 

608 

620 

632 

644 

656 

667 

679 

691 

703 714 

726 

738 

750 

761 

773 

785 

797 

808 

820 832 

844 

855 

867 

879 

891 

902 

914 

926 

56 937 949 

961 

972 

984 

996 

*008 

*019 

*031 

*043 

57 054, 066 

078 

089 

101 

113 

124 

136 

148 

159 

171 183 

194 

206 

217 

229 

241 

252 

264 

276 

287 299 

310 

322 

334 

345 

357 

368 

380 

392 

403 415 

426 

438 

449 

461 

473 

484 

496 

507 

519 530 

542 

553 

565 

576 

588 

600 

611 

623 

634 646 

657 

669 

680 

692 

703 

715 

726 

738 

749 761 

772 

784 

795 

807 

818 

830 

841 

852 

864 875 

887 

898 

910 

921 

933 

944 

955 

967 

57 978 990 

*001 

*013 

*024 

*035 

*047 

*058 

*070 

*081 

58 092 104 

115 

127 

138 

149 

161 

172 

184 

195 

206 218 

229 

240 

252 

263 

274 

286 

297 

309 

320 331 

343 

354 

365 

377 t 

388 

399 

410 

422 

433 444 

456 

467 

478 

490 

501 

512 

524 

535 

546 557 

569 

580 

591 

602 

614 

625 

636 

647 

659 670 

681 

692 

704 

715 

726 

737 

749 

760 

771 782 

794 

805 

816 

827 

838 

850 

861 

872 

883 894 

906 

917 

928 

939 

950 

961 

973 

984 

58 995 *006 

*017 

*028 

*040 

*051 

*062 

*073 

*084 

*095 

59 106 118 

129 

140 

151 

162 

173 

184 

195 

207 

218 229 

240 

251 

262 

273 

284 

295 

306 

318 

329 340 

351 

362 

373 

384 

395 

406 

417 

428 

439 450 

461 

472 

483 

494 

506 

517 

528 

539 

550 561 

572 

583 

594 

605 

616 

627 

638 

649 

660 671 

682 

693 

704 

715 

726 

737 

748 

759 

770 780 

791 

802 

813 

824 

835 

846 

857 

868 

879 890 

901 

912 

923 

934 

945 

956 

966 

977 

59 988 999 

*010 

*021 

*032 

*043 

*054 

*065 

*076 

*086 

60 097 108 

119 

130 

141 

152 

163 

173 

184 

195 

60 206 217 

228 

239 

249 

260 

271 

282 

293 

304 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 


60 206 i 

217 

228 

239! 

249 

260 ! 

1 271 

282 

1 293 

304 


384 

395 

405 

416 

426 

437 

448 

458 

469 

479 

490 

500 

511 

521 

532 

542 

553 

563 

574 

584 

595 

606 

616 

627 

637 

648 

658 

669 

679 

690 

700 

711 

721 

731 

742 

752 

763 

773 

784 

794 

805 

815 

826 

836 

847 

857 

868 

878 

888 

899 

61 909 

920 

930 

941 

951 

962 

972 

982 

993 

*003 

62 014 

024 

034 

045 

055 

066 

076 

086 

097 

107 

118 

128 

138 

149 

159 

170 

180 

190 

201 

211 

221 

232 

242 

252 

263 

273 

284 

294 

304 

315 


325 I 335 I 346 I 356 I 366 377 387 


21 428 439 449 459 469 480 490 500 511 521 

22 531 542 552 562 572 583 593 603 613 624 

23 634 644 655 665 675 685 696 706 716 726 

24 737 747 757 767 778 788 798 808 818 829 

25 839 849 859 870 880 890 900 910 921 931 

26 62 941 951 961 972 982 992 *002 *012 *022 *033 

27 63 043 053 063 073 083 094 104 114 124 134 

28 144 155 165 175 185 195 205 215 225 236 

29 246 256 266 276 286 296 306 317 327 337 

430 347 357 367 377 387 397 407 417 428 438 

__ _____ ^ ___ — 53g 

32 548 558 568 579 589 599 609 619 629 639 

33 649 659 669 679 689 699 709 719 729 739 

34 749 759 769 779 789 799 809 819 829 839 

35 849 859 869 879 889 899 909 919 929 939 

36 63 949 959 969 979 988 998 *008 *018 *028 *038 

37 64 048 058 068 078 088 098 108 118 128 137 

38 147 157 167 177 187 197 207 217 227 237 

39 246 256 266 276 286 296 306 316 326 335 

440 345" 355 365 375 385 395 404 414 424 434 

H 444~ 454 464 473 483 493 503 513 523 532 

42 542 552 562 572 582 591 601 611 621 631 

43 640 650 660 670 680 689 699 709 719 729 

44 738 748 758 768 777 787 797 807 816 826 

45 836 846 856 865 875 885 895 904 914 924 

46 64 933 943 953 963 972 982 992 *002 *011 *021 

47 65 031 040 050 060 070 079 089 099 108 118 

48 128 137 147 157 167 176 186 196 205 215 

49 225 234 244 254 263 273 283 292 302 312 

450 65321 331 341 350 360 369 379 389 398 408 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop 

. Parts 

m 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 



450 

65 321 

331 

341 

350 

360 

369 

379 

389 

398 

408 



51 

418 

427 

437 

447 

456 

466 

475 

485 

495 

504 



52 

514 

523 

533 

543 

552 

562 

571 

581 

591 

600 



53 

6 10 

619 

629 

639 

648 

658 

667 

677 

686 

696 



54 

706 

715 

725 

734 

744 

753 

763 

772 

782 

792 



55 

801 

811 

820 

830 

839 

849 

858 

868 

877 

887 



56 

896 

906 

916 

925 

935 

944 

954 

963 

973 

982 

1 

1.0 

57 

65 992 

*001 

*011 

*020 

*030 

*039 

*049 

*058 

*068 

*077 

2 

2.0 

58 

66 087 

096 

106 

115 

124 

134 

143 

153 

162 

172 

3 

3.0 

59 

181 

191 

200 

210 

219 

229 

238 

247 

257 

266 

4k 

4 0 






— — 






5 

5.0 

460 

276 

iEl 


E9 

El 



B 

B 

B 

6 

6.0 



wmm 


mtmm 

■■■■ 

Hi 

iMHi 

wmm 


■iiii 

7 

7.0 

61 

370 

380 

389 

398 

408 

417 

427 

436 

445 

455 

8 

8.0 

62 

464 

474 

483 

492 

502 

511 

521 

530 

539 

549 

9 

9.0 

63 

558 

567 

577 

586 

596 

605 

614 

624 

633 

642 



64 

652 

661 

671 

680 

689 

699 

708 

717 

727 

736 



65 

745 

755 

764 

773 

783 

792 

801 

811 

820 

829 



66 

839 

848 

857 

867 

876 

885 

894 

904 

913 

922 



67 

66 932 

941 

950 

960 

969 

978 

987 

997 

*006 

*015 



68 

67 025 

034 

043 

052 

062 

071 

080 

089 

099 

108 



69 

117 

127 

136 

145 

154 

164 

173 

182 

191 

201 



mi 

210 

219 

228 

237 

247 

256 

265 

274 

284 

293 



71 

302 

311 

321 

330 

339 

348 

357 

367 

376 

385 


9 

72 

394 

403 

413 

422 

431 

440 

449* 

459 

468 

477 

1 

0.9 

73 

486 

495 

504 

514 

523 

532 

541 

550 

560 

569 

2 

1.8 












3 

2.7 

74 

578 

587 

596 

605 

614 

624 

633 

642 

651 

660 

4k 

3.6 

75 

669 

679 

688 

697 

706 

715 

724 

733 

742 

752 

5 

6 

4.5 

5.4 

76 

761 

770 

779 

788 

797 

806 

815 

825 

834 

843 

7 

6.3 

77 

852 

861 

870 

879 

888 

897 

906 

916 

925 

934 

8 

7.2 

Q 1 

78 

67 943 

952 

961 

970 

979 

988 

997 

*006 

*015 

*024 

9 

O.JL 

79 

68 034 

043 

052 

061 

070 

079 

088 

097 

106 

115 



ggs 

124 

133 

142 

151 

160 

169 

178 

187 

196 

205 



81 

215 

224 

233 

242 

251 

260 

269 

278 

287 

296 



82 

305 

314 

323 

332 

341 

350 

359 

368 

377 

386 



83 

395 

404 

413 

422 

431 

440 

449 

458 

467 

476 



84 

485 

494 

502 

511 

520 

529 

538 

547 

556 

565 



85 

574 

583 

592 

601 

610 

619 

628 

637 

646 

655 



86 

664 

673 

681 

690 

699 

708 

717 

726 

735 

744 



87 

753 

762 

771 

780 

789 

797 

806 

815 

824 

833 


8 

88 

842 

851 

860 

869 

878 

886 

895 

904 

913 

922 

1 

2 

0.8 

1.6 

89 

68 931 

940 

949 

958 

966 

975 

984 

993 

*002 

*011 

3 

2.4 

490 

69 020 

028 

! 037 

046 

055 

064 

073 

082 

090 

099 

4t 

3.2 












5 

4.0 

91 

108 

117 

126 

135 

144 

152 

161 

170 

179 

188 

6 

4.8 

92 

197 

205 

214 

223 

232 

241 

249 

258 

267 

276 

7 

8 

5.6 

6.4 

93 

285 

294 

302 

311 

320 

329 

338 

346 

355 

364 

9 

7.2 

94 

373 

: 381 

390 

399 

408 

417 

425 

434 

443 

452 



95 

461 

469 

478 

487 

496 

504 

513 

522 

531 

539 



96 

548 

557 

566 

574 

583 

592 

601 

609 

618 

627 



97 

636 

644 

653 

662 

671 

'679 

688 

697 

705 

714 



98 

723 

732 

740 

749 

758 

767 

775 

784 

793 

801 



99 

810 

819 

827 

836 

845 

854 

862 

871 

880 

888 



a 

69 897 

906 

914 

923 

932 

940 

949 

958 

966 

975 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts (| 

500 

69 897 

906 

914 

923 

932 

940 

949 

958 

966 

975 



01 

69 984 

992 

*001 

*010 

*018 

*027 

*036 

*044 

*053 

*062 



02 

70 070 

079 

088 

096 

105 

114 

122 

131 

140 

148 



03 

157 

165 

174 

183 

191 

200 

209 

217 

226 

234 



04 

243 

252 

260 

269 

278 

286 

295 

303 

312 

321 



05 

329 

33 8 

346 

355 

364 

372 

381 

389 

398 

406 



06 

415 

424 

432 

441 

449 

458 

467 

475 

484 

492 


9 

07 

501 

509 

518 

526 

535 

544 

552 

561 

569 

578 

1 

0.9 

08 

586 

595 

603 

612 

621 

629 

638 

646 

655 

66 3 

2 

1.8 

09 

672 

680 

689 

697 

706 

714 

723 

731 

740 

749 

3 

2.7 












4' 

3 6 

510 

757 

766 

774 

783 

791 

800 

808 

817 

825 

834 

5 

4.5 












6 


11 

842 

851 

859 

868 

876 

885 

893 

902 

910 

919 

7 

6.3 

12 

70 927 

935 

944 

952 

961 

969 

978 

986 

995 

*003 

8 

7.2 

13 

71 012 

020 

029 

037 

046 

054 

063 

071 

079 

088 

.9 

8.1 

14 

096 

105 

113 

122 

130 

139 

147 

155 

164 

172 



15 

181 

189 

198 

206 

214 

223 

231 

240 

248 

257 



16 

265 

273 

282 

290 

299 

307 

315 

324 

332 

341 



17 

349 

357 

366 

374 

383 

391 

399 

408 

416 

425 



18 

433 

441 

450 

458 

466 

475 

483 

492 

500 

508 



19 

517 

525 

533 

542 

550 

559 

567 

575 

584 

592 



520 

600 

609 

617 

625 

634 

642 

650 

659 

667 

675 



21 

684 

692 

700 

709 

717 

725 

734 

742 

750 

759 



22 

767 

775 

784 

792 

800 

809 

817 

825 

834 

842 


8 

23 

850 

858 

867 

875 

883 

892 

900 

908 

917 

925 

1 

0.8 












2 

1.6 

24 

71 933 

941 

950 

958 

966 

975 

983 

991 

999 

*008 

3 

2.4 

25 

72 016 

024 

032 

041 

049 

057 

066 

074 

082 

090 

4 

3.2 

26 

099 

107 

115 

123 

132 

140 

148 

156 

165 

173 

5 

6 

4.0 

4.8 

27 

181 

189 

198 

206 

214 

222 

230 

239 

247 

255 

7 

5.6 j 

28 

263 

272 

280 

288 

296 

304 

313 

321 

329 

337 

8 

0 

6.4 

7.2 

29 

346 

354 

362 

370 

378 

387 

395 

403 

411 

419 


530 

428 

436 

444 

452 

460 

469 

477 

485 

493 

501 



31 

509 

518 

526 

534 

542 

550 

55 8 

567 

575 

583 



32 

591 

599 

607 

616 

624 

632 

640 

648 

656 

665 



33 

673 

681 

689 

697 

705 

713 

722 

730 

738 

746 



34 

754 

762 

770 

779 

787 

795 

803 

811 

819 

827 



35 

835 

843 

852 

860 

868 

876 

884 

892 

900 

908 



36 

916 

925 

933 

941 

949 

957 

965 

973 

981 

989 



37 

72 997 

*006 

*014 

*022 

*030 

*038 

*046 

*054 

*062 

*070 


m 

38 

73 078 

086 

094 

102 

111 

119 

127 

135 

143 

151 


1 

39 

159 

167 

175 

183 

191 

199 

207 

215 

223 

231 

1 

2 

0.7 

1.4 

540 

239 

247 

255 

263 

272 

280 

288 

296 

304 

312 

3 

2.1 












4 

2.8 

41 

320 

328 

336 

344 

352 

360 

368 

376 

384 

392 

5 

3.5 

42 

400 

408 

416 

424 

432 

440 

448 

456 

464 

472 

6 

4.2 

43 ' 

480 

488 

496 

504 

512 

520 

528 

536 

544 

552 

7- 

S 

4.9 

5.6 

44 

560 

568 

576 

584 

592 

600 

608 

616 

624 

632 

9 

6.3 

45 

640 

648 

656 

664 

672 

679 

687 

695 

703 

711 



46 

719 

727 

735 

743 

751 

759 

767 

775 

783 

791 



47 

799 

807 

815 

823 

830 

838 

846 

854 

862 

870 



48 

878 

886 

894 

902 

910 

918 

926 

933 

941 

949 



49 

73 957 

965 

973 

981 

989 

997 

*005 

*013 

*020 

*028 



550 

74 036 

044 

052 

060 

068 

076 

084 

092 

099 

107i 



N ! 

0 
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2 
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4 
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6 

7 

8 

9 

| Prop. Parts || 


248 



Table II. Common Logarithms of Numbers to Live Decimal Places 


Prop. Parts 




0 

1 

2 

3 

4 

5 

6 

7 

8 

11 

m 

74 036 

044 

052 

060 

068 

076 

084 

092 

099 

107 

51 

115 

123 

131 

139 

147 

155 

162 

170 

178 

186 

52 

194 

202 

210 

218 

225 

233 

241 

249 

257 

265 

53 

273 

280 

288 

296 

304 

312 

320 

327 

335 

343 

54 

351 

359 

367 

374 

382 

390 

398 

406 

414 

421 

55 

429 

437 

445 

453 

461 

468 

476 

484 

492 

500 

56 

507 

515 

523 

531 

539 

547 

554 

562 

570 

578 

57 

586 

593 

601 

609 

617 

624 

632 

640 

648 

656 

58 

663 

671 

679 

687 

695 

702 

710 

718 

726 

733 

59 

741 

749 

757 

764 

772 

780 

788 

796 

803 

811 

560 

819 

827 

834 

842 

850 

858 

865 

873 

881 

889 

61 

896 

904 

912 

920 

927 

935 

943 

950 

958 

966 

62 

74 974 

981 

989 

997 

*005 

*012 

*020 

*028 

* 035 , 

*043 

63 

75 051 

059 

066 

074 

082 

089 

097 

105 

113 

120 

64 

128 

136 

143 

151 

159 

166 

174 

182 

189 

197 

65 

205 

213 

220 

228 

236 

243 

251 

259 

266 

274 

66 

282 

289 

297 

305 

312 

320 

328 

335 

343 

351 

67 

358 

366 

374 

381 

389 

397 

404 

412 

420 ' 

427 

68 

435 

442 

450 ] 

458 

465 

473 

481 

488 

496 

504 

69 

511 

519 

526 

534 

542 

549 

557 

565 

572 

580 

570 

587 

595 

603 

610 

618 

626 

633 

641 

648 

656 

71 

664 

671 

679 

686 

694 

702 

709 

717 

724 

732 

72 

740 

747 

755 

762 

770 

778 

785 

793 

800 

808 

73 

815 

823 

831 

838 

846 

853 

861 

868 

876 

884 

74 

891 

899 

906 

914 

921 

929 

937 

944 

952 

959 

75 

75 967 

974 

982 

989 

997 

*005 

*012 

*020 

*027 

*035 

76 

76 042 

050 

057 

065 

072 

080 

087 

095 

103 

110 

77 

118 

125 

133 

140 

148 

155 

163 

170 

178 

185 

78 

193 

200 

208 

215 

223 

230 

238 

245 

253 

260 

79 

268 

275 

283 

290 

298 

305 

313 

■ 

320 

328 

335 

580 

343 

350 

358 

365 

373 

380 

388 

395 

403 

410 

81 

418 

425 

433 

440 

448 

455 

462 

470 

477 

485 

82 

492 

500 

507 

515 ; 

522 

530 

537 

545 

552 

559 

83 

567 

574 

582 

589 

597 

604 

612 ! 

619 

626 

634 

84 

641 

649 

656 

664 

671 

678 

686 

693 

701 

708 

85 

716 

723 

730 

738 

745 

753 

760 

768 

775 , 

782 

86 

790 

797 

805 

812 

819 

827 

834 

842 

849 

856 

87 

864 

871 

879 

886 

893 

901 

908 

916 

923 ! 

930 

88 

76 938 

945 

953 

960 

967 

975 

982 

989 

997 

*004 

89 

17 012 

019 

026 

034 

041 

048 

056 

063 

070 

078 


Prop. Parts 


159 

166 

173 

181 

188 

195 

203 

210 

217 

225 

232 

240 

247 

254 

262 

269 

276 

283 

291 

298 

305 

313 

320 

327 

335 

342 

349 

357 

364 

371 

379 

386 

393 

401 

408 

415 

422 

430 

437 

444 

452 

459 

466 

474 

481 

488 

495 

503 

510 

517 

525 

532 

539 

546 

554 

561 

568 

576 

583 

590 

597 

605 

612 

619 

627 

634 

641 

648 

656 

663 

670 

677 

685 

692 

699 

706 

714 

721 

728 

735 

743 

750 

757 

764 

772 

779 

786 

793 

801 

808 

77 815 

822 

830 

837 

844 

851 

859 

866 

873 

880 

0 

aiafci 
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ID 

0 

1 

2 

3 

4 

OH 

raj 

m 

8 

9 

Prop. Parts |] 

\rm 

77 815 

822 

830 

837 

844 

851 

859 

866 

873 

880 



01 

887 

895 

902 

909 

916 

924 

931 

938 

945 

952 



02 

77 960 

967 

974 

981 

988 

996 

*003 

*010 

*017 

*025 



05 

78 032 

039 

046 

053 

061 

068 

075 

082 

089 

097 



04 

104 

111 

118 

125 

132 

140 

147 

154 

161 

168 



05 

176 

183 

190 

197 

204 

211 

219 

226 

233 

240 



06 

247 

254 

262 

269 

276 

283 

290 

297 

305 

312 


s 

07 

319 

326 

333 

340 

347 

355 

362 

369 

376 

383 

1 

0.8 

08 

390 

398 

405 

412 

419 

426 

433 

440 

447 

455 

2 

1.6 

09 

462 

469 

476 

483 

490 

497 

504 

512 

519 

526 

3 

2.4 













3.2 

IBM 

533 

540 

547 

554 

561 

569 

576 

583 

590 

597 

5 

6 

4.0 

4.8 

11 

604 

611 

618 

625 

633 

640 

647 

654 

661 

668 

7 

5.6 

12 

675 

682 

689 

696 

704 

711 

718 

725 

732 

739 

8 

6.4 

13 

746 

753 

760 

767 

774 

781 

789 

796 

803 

810 

9 

7.2 

14 

817 

824 

831 

838 

845 

852 

859 

866 

873 

880 



15 

888 

895 

902 

909 

916 

923 

930 

937 

944 

951 



16 

78 958 

965 

972 

979 

986 

993 

*000 

*007 

*014 

*021 



17 

79 029 

036 

043 

050 

057 

064 

071 

078 

085 

092 



18 

099 

106 

113 

120 

127 

134 

141 

148 

155 

162 



19 

169 

176 

183 

190 

197 

204 

211 

218 

225 

232 



620 

239 

246 

253 

260 

267 

274 

281 

288 

295 

302 



21 

309 

316 

323 

330 

337 

344 

351 

358 

365 

372 



22 

379 

386 

393 

400 

407 

414 

421 

428 

435 

442 


7 

23 

449 

456 

463 

470 

477 

484 

491 

498 

505 

511 

1 

0.7 












2 

1.4 

24 

518 

525 

532 

539 

546 

553 

560 

567 

574 

581 

3 

2.1 

25 

588 

595 

602 

609 

616 

623 

630 

637 

644 

650 

4 

2.8 

26 

657 

664 

671 

678 

685 

692 

699 

706 

713 

720 

5 

6 

3.5 

4.2 

27 

727 

734 

741 

748 

754 

761 

768 

775 

782 

789 

7 

4.9 

28 

796 

803 

810 

817 

824 

831 

837 

844 

851 

858 

8 

a 

5.6 

f. X 

29 

865 

872 

879 

886 

893 

900 

906 

913 

920 

927 


o.*> 

630 

79 934 

941 

948 

955 

962 

969 

975 

982 

989 

996 



31 

80 003 

010 

017 

024 

030 

037 

044 

051 

058 

065 



32 

072: 

079 

085 

092 

099 

106 

113 

120 

127 

134 



33 

140, 

147 

154 

161 

168 

175 

182 

188 

195 

202 



34 

209 

216 

223 

229 

236 

243 

250 

257 

264 

271 



35 

277 

284 

291 

298 

305 

312 

318 

325 

332 

339 



36 

346 

353 

359 

366 

373 

380 

387 

393 

400 

407 



37 

414 

421 

428 

434 

441 

448 

455 

462 

468 

475 



38 

482 

489 

496 

502 

509 

516 

523 

530 

536 

543 


6 

39 

550 

557 

564 

570 

577 

584 

591 

598 

604 

611 

1 

2 i 

0.6 

1.2 

640 

618 

625 

632 

638 

645 

652 

659 

665 

672 

679 

3 

1.8 












4 

2.4 

41 

686 

693 

699 

706 

713 

720 

726 

733 

740 

747 

5 

3.0 

42 

754 

760 

767 

774 

781 

787 

794 

801 

808 

814 

6 

3.6 

43 

821 

828 

835 

841 

848 

855 

862 

868 

875 

882 

7 

4.2 












8 

4.8 

44 

889 

895 

902 

909 

916 

922 

929 

936 

943 

949 

9 

5.4 

45 

80 956 

963 

969 

976 

983 

990 

996 

*003 

*010 

*017 



46 

81 023 

030 

037 

043 

050 

057 

064 

070 

077 

084 



47 

090 

097 

104 

111 

117 

124 

131 

137 

144 

151 



48 

158 

164 

171 

178 

184 

191 

198 

204 

211 

218 



49 

224 

231 

238 

245 

251 

258 

265 

271 

278 

285 



650, 

81 291 

298 

305 

3 11 

318 

325 

331 

338 

345 

351 



m 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

| Prop. Parts || 


250 
















251 








Table II. Common Logarithms of Numbers to Five Decimal Places 


N 

0 

1 

2 

3 

4 

5 

@ 

7 

8 

9 

Prop. Parts || 

700 

84 510 

516 

522 

528 

535 

541 

547 

553 

559 

566 



01 

572 

578 

584 

590 

597 

603 

609 

615 

621 

628 



02 

634 

640 

646 

652 

658 

665 

671 

677 

683 

689 



03 

696 

702 

708 

714 

720 

726 

733 

739 

745 

751 



04 

757 

763 

770 

776 

782 

788 

794 

800 

807 

813 



05 

819 

825 

831 

837 

844 

850 

856 

862 

868 

874 



06 

880 

887 

893 

899 

905 

911 

917 

924 

930 

936 


7 

07 

84 942 

948 

954 

960 

967 

973 

979 

985 

991 

997 

1 

0.7 

08 

85 003 

009 

016 

022 

028 

034 

040 

046 

052 

058 

2 

3 

1.4 

2.1 

2.8 

09 

065 

071 

077 

083 

089 

095 

101 

107 

114 

120 

4 

710 

126 

132 

138 

144 

150 

156 

163 

169 

175 

181 

5 

6 

3.5 

4.2 

11 

187 

193 

199 

205 

211 

217 

224 

230 

236 

242 

l 

4.9 

12 

248 

254 

260 

266 

272 

278 

285 

291 

297 

303 

8 

5.6 

6.3 

13 

309 

315 

321 

327 

333 

339 

345 

352 

358 

364 


14 

370 

376 

382 

388 

394 

400 

406 

412 

418 

425 



15 

431 

437 

443 

449 

455 

461 

467 

473 

479 

485 



16 

491 

497 

503 

509 

516 

522 

528 

534 

540 

546 



17 

552 

558 

564 

570 

576 

582 

588 

594 

600 

606 



18 

612 

618 

625 

631 

637 

643 

649 

655 

661 

667 



19 

673 

679 

685 

691 

697 

703 

709 

715 

721 

727 



720 

733 

739 

745 

751 

757 

763 

769 

775 

781 

788 



21 

794 

800 

806 

812 

818 

824 

830 

836 

842 

848 


6 

22 

854 

860 

866 

872 

878 

884 

890 

896 

902 

908 


23 

914 

920 

926 

932 

938 

944 

950 

956 

962 

968 

1 

0.6 











2 

1.2 

24 

85 974 

980 

986 

992 

998 

*004 

*010 

*016 

*022 

*028 

3 

1.8 

25 

86 034 

040 

046 

052 

058 

064 

070 

076 

082 

088 

4 

2*4 

26 

094 j 

100 

106 

112 

118 

124 

130 

136 

141 

147 

5 

6 

3.0 

3.6 

27 

153 

159 

165 

171 

177 

183 

189 

195 

201 

207 

7 

8 
9 

4.2 

4.8 

5.4 

28 

213 

219 

225 

231 

237 

243 

249 

255 

261 

267 

29 

273 

279 

285 

291 

297 

303 

308j 

314 

320 

326 

730 

332 

338 

344 

350 

356 

362 

368! 

374 

380 

386 



31 

392 

398 

404 

410 

415 

421 ! 

42 7 

433 

439 

445 



32 

451 

457 

463 

469 

475 

481 

487 

493 

499 

504 



33 

510 

516 

522 

528 

534 

540 

546 

552 

558 

564 



34 

570 

576 

581 

587 

593 

599 

605 

611 

617 

623 



35 

629 

635 

641 

646 

652 

658 

664 

670 

676 

682 



36 

688 

694 

700 

705 

711 

717 

723 

729 

735 

741 



37 

747 

753 

759 

764 

770 

776 

782 

788 

794 

800 


5 

38 

806 

812 

817 

823 

829 

835 

841 

847 

853 

859 


39 

864 

870 

876 

882 

888 

894 

900 

906 

911 

917 

1 

2 

0.5 

1.0 

740 

923 

929 

935 

941 

947 

953 

958 

964 

970 

976 

3 

4 

1.5 

2.0 

41 

86 982 

988 

994 

999 

*005 

*011 

*017 

*023 

*029 

*035 

5 

6 

2.5 

3.0 

42 

87 040 

046 

052 

058 

064 

070 

075 

081 

087 

093 

43 

099 

105 

111 

116 

122 

128 

134 

140 

146 

151 

7 

8 

3.5 

4.0 

44 

157 

163 

169 

175 

181 

186 

192 

198 

204 

210 

9 

4.5 

45 

216 

221 

227 

233 

239 

245 

251 

256 

262 

268 



46 

274 

280 

286 

291 

297 

303 

309 

315 

320 

326 



47 

332 

338 

344 

349 

355 

361 

367 

373 

379 

384 



48 

390 

396 

402 

408 

413 

419 

425 

431 

437 

442 



49 

448 

454 

460 

466 

471 

477 

483 

489 

495 

500 



750 

87 506 

512 

518 

523 

529 

535 

541 

547 

552 

558 



N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts || 



Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 


Prop. Parts 


87 50 6 

512 

518 

523 

529 

535 

541 

547 

552 

558 

564 

570 

576 

581 

587 

593 

599 

604 

610 

616 

622 

628 

633 

639 

645 

651 

656 

662 

668 

674 

679 

685 

691 

697 

703 

708 

714 

720 

726 

731 

737 

743 

749 

754 

760 

766 

772 

777 

783 

789 

795 

800 

806 

812 

818 

823 

829 

835 

841 

846 

852 

858 

864 

869 

875 

881 

887 

892 

898 

904 

910 

915 

921 

927 

933 

938 

944 

950 

955 

961 

87 967 

973 

978 

984 

990 

996 

*001 

*007 

*013 

*018 

88 024 

030 

036 

041 

047 

053 

058 

064 

070 

076 

081 

087 

093 

098 

104 

110 

116 

121 

127 

133 

138 

144 

150 

156 

161 

167 

173 

178 

184 

190 

195 

201 

207 

213 

218 

224 

230 

235 

241 

247 

252 

258 

264 

270 

275 

281 

287 

292 

298 

304 

309 

315 

321 

326 

332 

338 

343 

349 

355 

360 

366 

372 

377 

383 

389 

395 

400 

406 

412 

417 

423 

429 

434 

440 

446 

451 

457 

463 

468 

474 

480 

485 

491 

497 

502 

508 

513 

519 

525 

530 

536 

542 

547 

553 

559 

564 

570 

576 

581 

587 

593 

598 

604 

610 

615 

621 

627 

632 

638 

643 

649 

655 

660 

666 

672 

677 

683 

689 

694 

700 

705 

711 

717 

722 

728 

734 

739 

745 

750 

756 

762 

767 

773 

779 

784 

790 

795 

801 

807 

812 

818 

824 

829 

835 

840 

846 

852 

857 

863 

868 

■ 874 

880 

885 

891 

897 

902 

908 

913 

919 

925 

930 

936 

941 

947 

953 

958 

964 

969 

975 

981 

88 986 

992 

997 

*003 

*009 

*014 

*020 

*025 

*031 

*037 

89 042 

048 

053 

059 

064 

070 

076 

081 

087 

092 

098 

104 

109 

115 

120 

j 126 

131 

137 

143 

148 

154 

159 

165 

170 

176 

182 

187 

193 

198 

204 

209 

215 

221 

226 

232 

237 

243 

248 

254 

260 

265 

271 

276 

282 

287 

293 

298 

304 

310 

315 

321 

326 

332 

337 

343 

348 

354 

360 

365 

371 

376 

382 

387 

393 

398 

404 

409 

415 

421 

426 

432 

437 

443 

448 

454 

. 

459 

465 

470 

476 

481 

, 487 

492 

498 

504 

509 

515 

520 

526 

531 

537 

542 

548 

553 

559 

564 

570 

575 

581 

586 

592 

597 

603 

609 

614 

620 

625 

631 

636 

642 

647 

653 

658 

664 

669 

675 

680 

686 

691 

697 

702 

708 

713 

719 

724 

730 

735 

741 

746 

752 

757 

763 

768 

774 

779 

785 

790 

796 

801 

807 

812 

818 

823 

829 

834 

840 

845 

851 

856 

862 

867 

873 

878 

883 

889 

894 

900 

905 

911 

916 

922 

927 

933 

938 

944 

949 

955 

960 

966 

971 

977 

89 982 

988 

993 

998 

*004 

*009 

*015 

*020 

*026 

*031 

90 037 

042 

048 

053 

059 

064 

069 

075 

1 080 

086 

091 

097 

102 

108 

113 

119 

124 

129 

135 

140 

146 

151 

157 

162 

168 

173 

179 

184 

189 

195 

200 

206 

211 

217 

222 

227 

233 

238 

244 

249 

255 

260 

266 

271 

276 

282 

287 

293 

298 

304 

90 309 

314 

320 

325 

331 

336 

342 

347 1 

352 

358 

0 

1 

2 

3 

4 

5 

6 

i 7 

8 

9 


253 

















Table II. Common Logarithms of Numbers to Five Decimal Places 


It N 

0 

1 

2 

3 

4 

JOI 

6 

7 

s 

9 

Prop. Parts ] 

IlSfilil 

90 309 

314 

320 

325 

331 

336 

342 

347 

352 

358 



01 

363 

369 

374 

380 

385 

390 

396 

401 

407 

412 



02 

417 

423 

428 

434 

439 

445 

450 

455 

461 

466 



03 

472 

477 

482 

488 

493 

499 

504 

509 

515 

520 



04 

526 

531 

536 

542 

547 

553 

558 

563 

569 

574 



05 

580 

585 

590 

596 

601 

607 

612 

617 

623 

628 



06 

634 

639 

644 

650 

655 

660 

666 

671 

677 

682 



07 

687 

693 

698 

703 

709 

714 

720 

725 

730 

736 



08 

741 

747 

752 

757 

763 

768 

773 

779 

784 

789 



09 

795 

800 

806 

811 

816 

822 

827 

832 

838 

843 



■ 

849 

854 

859 

865 

870 

875 

881 

886 

891 

897 



11 

902 

907 

913 

918 

924 

929 

934 

940 

945 

950 



12 

90 956 

961 

966 

972 

977 

982 

988 

993 

998 

*004 


6 

13 

91 009 

014 

020 

025 

030 

036 

041 

046 

052 

057 

1 

0.6 












2 

1.2 

14 

062 

068 

073 

078 

084 

089 

094 

100 

105 

110 

3 

1.8 

15 

116 

121 

126 

132 

137 

142 

148 

153 

158 

164 

4 

2.4 

16 

169 

174 

180 

185 

190 

196 

201 

206 

212 

217 

5 

3.0 












6 

3.6 

17 

222 

228 

233 

238 

243 

249 

254 

259 

265 

270 

7 

4.2 

18 

275 

281 

286 

291 

297 

302 

307 

312 

318 

323 

8 

4.8 

19 

328 

334 

339 

344 

350 

355 

360 

365 

371 

376 

9 

5.4 

m 

381 

387 

392 

397 

403 

408 

413 

418 

424 

429 



21 

434 

440 

445 

450 

455 

461 

466 

471 

477 

482 



22 

487 

492 

498 

503 

508 

514 

519 

524 

529 

535 



23 

540 

545 

551 

556 

561 

566 

572 

577 

582 

587 



24 

593 

598 

603 

609 

614 

619 

624 

630 

635 

640 



25 

645 

651 

656 

661 

666 

672 

677 

682 

687 

693 



26 

698 

703 

709 

714 

719 

724 

730 

1 

735 

740 

745 



27 

751 

756 

761 

766 

772 

777 

782 

787 

793 

798 



28 

803 

808 

814 

819 

824 

829 

834 

840 

845 

850 



29 

855 

861 

866 

871 

876 

882 

88 7 

892 

897 

903 



a 

908 

913 

918 

924 

929 

934 

939 

944 

950 

955 



31 

91 960 

965 

971 

976 

981 

986 

991 

997 

*002 

*007 


5 

32 

92 012 

018 

023 

028 

033 

038 

044 

049 

054 

059 

1 

0.5 

33 

065 

070 

075 

080 

085 

091 

096 

101 

106 

111 

2 

1.0 




i 








3 

1.5 

34 

117 

122 

127 

132 

137 

143 

148 

153 

158 

163 

4 

2.0 

35 

169 

174 

179 

184 

189 

195 

200 

205 

210 

215 

5 

2.5 

36 

221 

226 

231 

236 | 

241 

247 

252 

257 

262 

267 

6 

3.0 





i 







7 

3.5 

37 

273 

278 

283 

288 

293 

298 

304 

309 

314 

319 

8 

4.0 

38 

324 

330 

335 

340 

345 

350 

355 

361 

366 

371 

9 

4.5 

39 

376 

381 

387 

392 

397 

402 

407 

412 

418 

423 




428 

433 

438 

443 

449 

454 

459 

464 

469 

474 



41 

480 

485 

490 

495 

500 

505 

511 

516 

521 

526 



42 

531 

536 

542 

547 

552 

557 

562 

567 

572 

578 



43 

583 

588 

593, 

598 

603 

609 

614 

619 

624 

629 



44 

634 

639 

645 

650 

655 

660 

665 

670 

675 

681 



45 

686 

691 

696 

701 

706 

711 

716 

722 

727 

732 



46 

737 

742 

747 

752 

758 

763 

768 

773 

778 

783 



47 

78 8 

793 

799 

804 

809 

814 

819 

824 

829 

834 



48 

840 

845 

850 

855 

860 

865 

870 

875 

881 

886 



49 

891 

896 

901 

906 

911 

916 

921 

927 

932 

937 



HI 

92 942 

947 

952 

957 

962 

967 

973 

978 

; 983 

988 



r* 

0 

1 

2 

3 

4 

5 

6 

mm 

m 

mm 

Prop. Parts || 





















Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 


m 

O 

H 

i 

B 

6 

7 

8 

9 

850 

92 942 

947 

952 

957 

962 

967 

973 

978 

983 

988 

51 

92 993 

998 

*003 

*008 

*013 

*018 

*024 

*029 

*034 

*039 

52 

93 044 

049 

054 

059 

064 

069 

075 

080 

085 

090 

53 

095 

100 

105 

no 

115 

120 

125 

131 

136 

141 

54 

146 

151 

156 

161 

166 

171 

176 

181 

186 

192 

55 

197 

202 

207 

212 

217 

222 

227 

232 

237 

242 

56 

247 

252 

258 

263 

268 

273 

278 

283 

288 

293 

57 

298 

303 

308 

313 

318 

323 

328 

334 

339 

344 

58 

349 

354 

359 

364 

369 

374 

379 

384 

389 

394 

59 

399 

404 

409 

414 

420 

425 

430 

435 

440 

445 

EH 

450 

455 

460 

465 

470 

475 

480 

485 

Eg 

495 

61 

500 

505 

510 

515 

520 

526 

531 

536 

541 

546 

62 

551 

556 

561 

566 

571 

576 

581 

586 

591 

596 

63 

601 

606 

611 

616 

621 

626 

631 

636 

641 

646 

64 

651 

656 

661 

666 

671 

676 

682 

687 

692 

697 

65 

702 

707 

712 

717 

722 

727 

732 

737 

742 

747 

66 

752 ! 

757 

762 

767 

772 

777 

782 

787 

792 

797 

67 

802 

807 

812 

817 

822 

827 

832 

837 

842 

847 

68 

852 

857 

862 

867 

872 

877 

882 

887 

892 

I 897 

69 

902 

907 

912 

917 

922 

927 

932 

937 

942 

1 947 

870 




7 

3.5 

8 

4.0 

a 

4.5 


4 

i 

0.4 

2 

0.8 

3 

1.2 

4 

1.6 

5 


6 

KSm- 

7 

ns 

8 


9 

3.6 


94 002 

007 

012 

017 

022 

027 

032 

037 

042 

047 

052 

057 

062 

067 

072 

077 

082 

086 

091 

096 

101 

106 

111 

116 

121 

126 

131 

136 

141 

146 

151 

156 

161 

166 

171 

176 

181 

186 

191 

196 

201 

206 

211 

216 

221 

226 

231 

236 

240 

245 

250 

255 

260 

265 

270 

275 

280 

285 

290 

295 

300 

305 

310 

315 

320 

325 

330 

335 

340 

345 

349 

354 

359 

364 

369 

374 

379 

384 

389 

394 

399 

404 

409 

414 

419 

424 

429 

433 

438 

| 443 


81 498 503 507 512 517 522 527 532 537 542 

82 547 552 557 562 567 571 576 581 586 591 

83 596 601 606 611 616 621 626 630 635 640 

84 645 650 655 660 665 670 675 680 685 689 

85 694 699 704 709 714 719 724 729 734 738 

86 743 748 753 758 763 768 773 778 783 787 


87 792 797 

88 841 846 

89 890 895 


802 807 812 817 822 827 832 836 

851 856 861 866 871 876 880 885 

900 905 910 915 919 924 929 934 


91 94 988 993 998 *002 *007 *012 *017 *022 *027 *032 

92 95 036 041 046 051 056 061 066 071 075 080 

93 085 090 095 100 105 109 114 119 124 129 

94 134 139 143 148 153 158 163 168 173 177 

95 182 187 192 197 202 207 211 216 221 226 

96 231 236 240 245 250 255 260 265 270 274 

97 279 284 289 294 299 303 308 313 318 323 

98 328 332 337 342 347 352 357 361 366 371 

99 376 381 386 390 395 400 405 410 415 419 

95 424 429 434 439 444 448 453 458 463 468 



















Table II. Common Logarithms of Numbers to Five Decimal Places 


“in 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Prop. Parts j| 


95 424 

429 

434 

439 

444 

448 

453 

458 

463 

468 



01 

472 

477 

482 

487 

492 

497 

501 

506 

511 

516 



02 

521 

525 

530 

535 

540 

545 

550 

554 

559 

564 



03 

569 

574 

578 

583 

588 

593 

598 

602 

607 

612 



04 

617 

622 

626 

631 

636 

641 

646 

650 

655 

660 



05 

665 

670 

674 

679 

684 

689 

694 

698 

703 

708 



06 

713 

718 

72 2 

727 

732 

737 

742 

746 

751 

756 



07 

761 

766 

770 

775 

780 

785 

789 

794 

799 

804 



OS 

809 

813 

818 

823 

828 

832 

837 

842 

847 

852 



09 

856 

861 

866 

871 

875 

880 

885 

890 

895 

899 




904 

909 

914 

918 

923 

928 

933 

938 

942 

947 



11 

952 

957 

961 

966 

971 

976 

980 

985 

990 

995 



12 

95 999 

*004 

*009 

*0X4 

*019 

*023 

*028 

*033 

*038 

*042 


5 

13 

96 047 

052 

057 

061 

066 

071 

076 

080 

085 

090 

1 

0.5 












2 

1.0 

14 

095 

099 

104 

109 

114 

118 

123 

128 

133 

137 

3 

1.5 

15 

142 

147 

152 

156 

161 

166 

171 

175 

180 

185 

4 

2.0 

16 

190 

194 

199 

204 

209 

213 

218 

223 

227 

232 

5 

2.5 












6 

3.0 

17 

237 

242 

246 

251 

256 

261 

265 

270 

275 

280 

7 

3.5 

IS 

284 

289 

294 

298 

303 

308 

313 

317 

322 

327 

8 

4.0 

19 

332 

336 

341 

346 

350 

355 

360 

365 

369 

374 

$ 

4.5 

920 

379 

384 

388 

393 

.. 

398 

402 

407 

412 

417 

421 



21 

426 

431 

435 

440 

445 

450 

454 

459 

464 

468 



22 

473 

478 

483 

487 

492 

497 

501 

506 

511 

515 



23 

520 

525 

530 

534 

539 

544 

548 

553 

558 

562 



24 

567 

572 

577 

581 

586 

591 

595 

600 

605 

609 



25 

614 

619 

624 

628 

633 

638 

642 

647 

652 

656 



26 

66 1 

666 

670 

675 | 

680 

685 

689 

694 

699 

703 



27 

708 

713 

717 

722 

727 

731 

736 

741 

745 

750 



28 

755 

759 

764 

769 

774 

778 

783 

788 

792 

797 



29 

802 

806 

811 

816 

820 

825 

830 

834 

839 

844 



930 

848 

853 

858 

862 

867 

872 

876 

881 

886 

890 



31 

895 

900 

904 

909 

914 

918 

923 

928 

932 

937 


4 

32 

942 

946 

951 

956 

960 

965 

970 

974 

979 

984 

1 

0.4 

33 

96 988 

993 | 

997 

*002 

*007 

*011 

*016 

*021 

*025 

*030 

2 

0.8 












3 

1.2 

34 

97 035 

039 

044 

049 

053 

058 

063 

067 

072 

077 

4 

1.6 

35 

081 

086 

090 

095 

100 

104 

109 

114 

118 

123 

5 

2.0 

36 

128 

132 

137 

142 

146 

151 

155 

160 

165 

169 

6 

2.4 












7 

2.8 

37 

174 

179 

183 

188 

192 

197 

202 

206 

211 

216 

8 

3.2 

38 

220 

1 225 

230 

234 

239 

243 

248 

253 

257 

262 

9 

3.6 

j 39 

267 

271 

276 

280 

285 

290 

294 

299 

304 

308 



940 

313 

317 

322 

327 

331 

336 

340 

345 

350 

354 



41 

359 

364 

368 

373 

377 

382 

387 

391 

396 

400 ! 



42 

405 

410 

414 

419 

424 

428 

433 

437 

442 

447 



43 

451 

456 

460 

465 

470 

474 

479 

483 

488 

493 



44 

497 

502 

506 

511 

516 

520 

525 

529 

534 

539 



45 

543 

548 

552 

557 

562 

566 

571 

575 

580 

58 5 



46 

589 

594 

598 

603 

607 

612 

617 

621 

626 

630 



i 47 

635 

640 

644 

649 

653 

658 

663 

667 

672 

676 



1 48 

681 

685 

690 

695 

699 

704 

708 

713 

717 

722 



| 49 

727 

731 

736 

740 

745 

749 

754 

759 

763 

768 



950 

97 772 

777 

782 

786 

791 

795 

800 

804 

809 

815 



El 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

| Prop. Parts | 














Table II. Common Logarithms of Numbers to Five Decimal Places 



0 


4 

— 

rani 

Ell 











■KJW 

97 772 

777 

782 

786 

791 

795 

800 

804 

809 

813 

818 

823 

827 

832 

836 

841 

845 

850 

855 

859 

864 

868 

873 

877 

882 

886 

891 

896 

900 

905 

909 

914 

918 

923 

928 

932 

937 

941 

946 

950 

97 955 

959 

964 

968 

973 

978 

982 

987 

991 

996 

9S 000 

005 

009 

014 

019 

023 

028 

032 

037 

041 

046 

050 

055 

059 

064 

068 

073 

078 

082 

087 

091 

096 

100 

105 

109 

114 

118 

123 

127 

132 

137 

141 

146 

150 

155 

159 

164 

168 

173 

177 

182 

186 

191 

195 

200 

204 

209 

214 

218 

223 

227 

232 

236 

241 

245 

250 

254 

259 

263 

268 

272 

277 

281 

286 

290 

295 

299 

304 

308 

313 

318 

322 

327 

331 

336 

340 

345 

349 

354 

358 

363 

367 

372 

376 

381 

385 

390 

394 

399 

403 

408 

412 

417 

421 

426 

430 

435 

439 

444 

448 

453 

457 

462 

466 

471 

475 

480 

484 

489 

493 

498 

502 

507 

511 

516 

520 

525 

529 

534 

538 

543 

547 

552 

556 

561 

565 

570 

574 

579 

583 

588 

592 

597 

601 

605 

610 

614 

619 

623 

628 

632 

637 

641 

646 

650 

655 

659 

664 

668 

673 

677 

682 

686 

691 

695 

700 

704 

709 

713 

717 











722 

726 

731 

735 

740 

744 

749 

753 

758 

762 

767 

771* 

776 

780 

784 

789 

793 

798 

802 

807 

811 

816 

820 

825 

829 

834 

838 

843 

847 

851 

856 

860 

865 

869 

874 

878 

883 1 

887 

892 

896 

900 

905 

909 

914 

918 

923 

927 

932 

936 

941' 

945 

949 

954 

958 

963 

967 

972 

976 

981 

( 985 

98 989 

994 

998 

*003 

*007 

*012 

*016 

*021 

*025 

*029 

99 034 

038 

043 

047 

052 

056 

061 

065 

069 

074 

078 

083 

087 

092 

096 

100 

105 

109 

114 

118 











123 

127 

131 

136 

140 

145 

149 

154 

158 

162 

I 167 

171 

176 

tm 

t 185 

189 

193 

198 

202 

m 



91 607 612 616 621 625 1 

92 65 1 656 660 664 669! 

93 695 699 704 708 712 j 

94 739 743 747 752 756 

95 782 787 791 795 800 

96 826 830 835 839 843 

97 870 874 878 883 8 87 

98 913 917 922 926 930 

99 99 957 961 965 970 974 


629 634 638 642 647 

673 677 682 686 691] 

717 721 726 730 734 1 

760 765 769 774 778 

804 808 813 817 822 

848 852 856 861 865 

891 896 900 904 909 

935 939 944 948 952 

978 983 987 991 996 


loool 00 000 004 009 013 017 022 026 030 035 039 
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INDEX 


Arithmetic mean, 33 
short methods of computing, 39 
of sub-sets, 44, 193 
Array, 193 

Asymmetry, see skewness 
Averages, Chapter III 

discussion of different, 51, 52- 
58 

Average deviation, see mean devi- 
ation 

Burr, I. W., Ill ft. nt. 

Charlier check, 66, 87 
Charts, 24 
ratio, 157 

Classification of data, 9-15 
Class 

boundary, 15 
interval, 11 
limits, 15 
marks, 11 
mid-value of, 11 
Coefficient 
of alienation, 185 
of correlation, Chapter VII 
of variation, 90 
Collateral reading, 5 
Combination of sets, 99 
Compound interest law, 156 
Computing machines, 4, 71 
Constant, 7 
Correlation 
and regression, 178 
coefficient, Chapter VIII 
rank, 222 
ratio, 212 

relation to common causes, 
225 

interpretation of, 225 

259 


intraclass, 232 
surface, 208 
table, 189 

Cumulative frequencies, 16, 27, 
132 

Curve of error, see normal curve 
Curve fitting, Chapter VII, 124 
Curves of growth,’ 53, 152, 164, 
166 

Deviation, 36 
mean or average, 84 
root-mean-square, 87 ft. nt., 
99 

Dispersion, see measures of, 
relative 90 
Dwyer, P. S., 176 
Estimate, standard error of, 179 
Frequency 
curves, 25, 112 
distributions, Chapter I 
graphical representation of, 
Chapter II 
polygon, 24 

Function, definition, 22 
exponential, 152 
frequency, 112 
linear, 137 
parabolic, 162 
quadratic, 138 
Geometric mean, 52 
Gompertz curve, 164 
Graduation by means of normal 
curve, 128 

Graphical representation, Chap- 
ters II, VII 
Harmonic mean, 55 
Histogram, 25 
Hotelling, H., 167 ft. nt. 



Index 
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Huntington, E. V., 167 
Kendall, M. G., 51 ft. nt. 
Kurtosis, 73, 109 
Least-squares method, 144 
Logarithmic paper, 161 
Logistic curve, 166 
Makeham’s law, 167 
Mean 

arithmetic, 33 
geometric, 52 
harmonic, 55 
of means, 43 
Mean deviation, 84 
Measures of dispersion, Chap- 
ter V 

mean deviation, 84 
quartiles, 82 

semi-interquartile range, 82 
standard deviation, 86 
Median, 47 
Mode, 47 

Moment of a distribution, Chap- 
ter IV 

method of, 141 
Normal curve, Chapter VI 
explanation of tables of, 116 
fitted to observed data, 124 
properties of, 118 
standard form of, 115 
Normal equations, 145 
Ogive, 27 

Parabola, fitting a, 162 
Parameter, 115, 124, 141, 153 
Percentiles, 84 
Probability, 131 
Probability paper, 132 
Quartiles, 82 
of normal curve, 119 
Range, 16 
Ratio charts, 157 
Reed-Peatj.-fta£S &uJ-6 6 


Regression } 
coefficients, 178 
linear, 177 
non-linear, 212 
testing linearity of, 217 
Residuals, 144 
Rietz, H. L., 114 ft. nt. 

Scatter diagram, 171 
Semi-logarithmic paper, 157 
Sheppard’s corrections, 78, 88 
Shewhart, W. A., 75 
Skewness, 73, 109 
Snedecor, G. W., 178 
Standard units, 69 
Statistic, 124 
Standard deviation, 68 
of combination of sets, 99 
of grouped data, 86 
of ungrouped data, 93 
Straight line, 137 
fitting to data, 140 
Symmetry, 73, 109 
Tables 

areas under normal curve, 
Appendix 

logarithms of numbers, Ap- 
pendix 

ordinates of normal curve 
Appendix 
Tabulation, 9 
Time series, 150 
Translation of axes, 36 
Trend, 150, 162 
Variability, see dispersion 
Variable, 7 
Variance, 87 
Variates, 7 

Walker, Helen M., 199 
Weighted mean, 33 
Wilkens, J. E., 74 ft. nt 
Wilson, E. R., 226 



